Обсуждение: Pgoutput not capturing the generated columns
Hi PG Users. We are using Debezium to capture the CDC events into Kafka. With decoderbufs and wal2json plugins the connector is able to capture the generated columns in the table but not with pgoutputplugin. We tested with the following example: CREATE TABLE employees ( id SERIAL PRIMARY KEY, first_name VARCHAR(50), last_name VARCHAR(50), full_name VARCHAR(100) GENERATED ALWAYS AS (first_name || ' ' || last_name) STORED ); // Inserted few records when the connector was running Insert into employees (first_name, last_name) VALUES ('ABC' , 'XYZ’); With decoderbufs and wal2json the connector is able to capture the generated column `full_name` in above example. But withpgoutput the generated column was not captured. Is this a known limitation of pgoutput plugin? If yes, where can we request to add support for this feature? Thanks. Rajendra.
On Tue, Aug 1, 2023, at 3:47 AM, Rajendra Kumar Dangwal wrote:
With decoderbufs and wal2json the connector is able to capture the generated column `full_name` in above example. But with pgoutput the generated column was not captured.
wal2json materializes the generated columns before delivering the output. I
decided to materialized the generated columns in the output plugin because the
target consumers expects a complete row.
Is this a known limitation of pgoutput plugin? If yes, where can we request to add support for this feature?
I wouldn't say limitation but a design decision.
The logical replication design decides to compute the generated columns at
subscriber side. It was a wise decision aiming optimization (it doesn't
overload the publisher that is *already* in charge of logical decoding).
Should pgoutput provide a complete row? Probably. If it is an option that
defaults to false and doesn't impact performance.
The request for features should be done in this mailing list.
Thanks Euler,
Greatly appreciate your inputs.
> Should pgoutput provide a complete row? Probably. If it is an option that defaults to false and doesn't impact performance.
Yes, it would be great if this feature can be implemented.
> The logical replication design decides to compute the generated columns at subscriber side.
If I understand correctly, this approach involves establishing a function on the subscriber's side that emulates the operation executed to derive the generated column values.
If yes, I see one potential issue where disparities might surface between the values of generated columns on the subscriber's side and those computed within Postgres. This could happen if the generated column's value relies on the current_time function.
Please let me know how can we track the feature requests and the discussions around that.
Thanks,
Rajendra.
Hi PG Hackers. We are interested in enhancing the functionality of the pgoutput plugin by adding support for generated columns. Could you please guide us on the necessary steps to achieve this? Additionally, do you have a platform for tracking suchfeature requests? Any insights or assistance you can provide on this matter would be greatly appreciated. Many thanks. Rajendra.
On Wed, May 8, 2024 at 11:39 AM Rajendra Kumar Dangwal <dangwalrajendra888@gmail.com> wrote: > > Hi PG Hackers. > > We are interested in enhancing the functionality of the pgoutput plugin by adding support for generated columns. > Could you please guide us on the necessary steps to achieve this? Additionally, do you have a platform for tracking suchfeature requests? Any insights or assistance you can provide on this matter would be greatly appreciated. The attached patch has the changes to support capturing generated column data using ‘pgoutput’ and’ test_decoding’ plugin. Now if the ‘include_generated_columns’ option is specified, the generated column information and generated column data also will be sent. Usage from pgoutput plugin: CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a * 2) STORED); CREATE publication pub1 for all tables; SELECT 'init' FROM pg_create_logical_replication_slot('slot1', 'pgoutput'); SELECT * FROM pg_logical_slot_peek_binary_changes('slot1', NULL, NULL, 'proto_version', '1', 'publication_names', 'pub1', 'include_generated_columns', 'true'); Usage from test_decoding plugin: SELECT 'init' FROM pg_create_logical_replication_slot('slot2', 'test_decoding'); CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a * 2) STORED); INSERT INTO gencoltable (a) VALUES (1), (2), (3); SELECT data FROM pg_logical_slot_get_changes('slot2', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'include_generated_columns', '1'); Currently it is not supported as a subscription option because table sync for the generated column is not possible as copy command does not support getting data for the generated column. If this feature is required we can remove this limitation from the copy command and then add it as a subscription option later. Thoughts? Thanks and Regards, Shubham Khanna.
Вложения
Dear Shubham, Thanks for creating a patch! Here are high-level comments. 1. Please document the feature. If it is hard to describe, we should change the API. 2. Currently, the option is implemented as streaming option. Are there any reasons to choose the way? Another approach is to implement as slot option, like failover and temporary. 3. You said that subscription option is not supported for now. Not sure, is it mean that logical replication feature cannot be used for generated columns? If so, the restriction won't be acceptable. If the combination between this and initial sync is problematic, can't we exclude them in CreateSubscrition and AlterSubscription? E.g., create_slot option cannot be set if slot_name is NONE. 4. Regarding the test_decoding plugin, it has already been able to decode the generated columns. So... as the first place, is the proposed option really needed for the plugin? Why do you include it? If you anyway want to add the option, the default value should be on - which keeps current behavior. 5. Assuming that the feature become usable used for logical replicaiton. Not sure, should we change the protocol version at that time? Nodes prior than PG17 may not want receive values for generated columns. Can we control only by the option? 6. logicalrep_write_tuple() ``` - if (!column_in_column_list(att->attnum, columns)) + if (!column_in_column_list(att->attnum, columns) && !att->attgenerated) + continue; ``` Hmm, does above mean that generated columns are decoded even if they are not in the column list? If so, why? I think such columns should not be sent. 7. Some functions refer data->publish_generated_column many times. Can we store the value to a variable? Below comments are for test_decoding part, but they may be not needed. ===== a. pg_decode_startup() ``` + else if (strcmp(elem->defname, "include_generated_columns") == 0) ``` Other options for test_decoding do not have underscore. It should be "include-generated-columns". b. pg_decode_change() data->include_generated_columns is referred four times in the function. Can you store the value to a varibable? c. pg_decode_change() ``` - true); + true, data->include_generated_columns ); ``` Please remove the blank. Best Regards, Hayato Kuroda FUJITSU LIMITED https://www.fujitsu.com/
Here are some review comments for the patch v1-0001. ====== GENERAL G.1. Use consistent names It seems to add unnecessary complications by having different names for all the new options, fields and API parameters. e.g. sometimes 'include_generated_columns' e.g. sometimes 'publish_generated_columns' Won't it be better to just use identical names everywhere for everything? I don't mind which one you choose; I just felt you only need one name, not two. This comment overrides everything else in this post so whatever name you choose, make adjustments for all my other review comments as necessary. ====== G.2. Is it possible to just use the existing bms? A very large part of this patch is adding more API parameters to delegate the 'publish_generated_columns' flag value down to when it is finally checked and used. e.g. The functions: - logicalrep_write_insert(), logicalrep_write_update(), logicalrep_write_delete() ... are delegating the new parameter 'publish_generated_column' down to: - logicalrep_write_tuple The functions: - logicalrep_write_rel() ... are delegating the new parameter 'publish_generated_column' down to: - logicalrep_write_attrs AFAICT in all these places the API is already passing a "Bitmapset *columns". I was wondering if it might be possible to modify the "Bitmapset *columns" BEFORE any of those functions get called so that the "columns" BMS either does or doesn't include generated cols (as appropriate according to the option). Well, it might not be so simple because there are some NULL BMS considerations also, but I think it would be worth investigating at least, because if indeed you can find some common place (somewhere like pgoutput_change()?) where the columns BMS can be filtered to remove bits for generated cols then it could mean none of those other patch API changes are needed at all -- then the patch would only be 1/2 the size. ====== Commit message 1. Now if include_generated_columns option is specified, the generated column information and generated column data also will be sent. Usage from pgoutput plugin: SELECT * FROM pg_logical_slot_peek_binary_changes('slot1', NULL, NULL, 'proto_version', '1', 'publication_names', 'pub1', 'include_generated_columns', 'true'); Usage from test_decoding plugin: SELECT data FROM pg_logical_slot_get_changes('slot2', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'include_generated_columns', '1'); ~ I think there needs to be more background information given here. This commit message doesn't seem to describe anything about what is the problem and how this patch fixes it. It just jumps straight into giving usages of a 'include_generated_columns' option. It also doesn't say that this is an option that was newly *introduced* by the patch -- it refers to it as though the reader should already know about it. Furthermore, your hacker's post says "Currently it is not supported as a subscription option because table sync for the generated column is not possible as copy command does not support getting data for the generated column. If this feature is required we can remove this limitation from the copy command and then add it as a subscription option later." IMO that all seems like the kind of information that ought to also be mentioned in this commit message. ====== contrib/test_decoding/sql/ddl.sql 2. +-- check include_generated_columns option with generated column +CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a * 2) STORED); +INSERT INTO gencoltable (a) VALUES (1), (2), (3); +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'include_generated_columns', '1'); +DROP TABLE gencoltable; + 2a. Perhaps you should include both option values to demonstrate the difference in behaviour: 'include_generated_columns', '0' 'include_generated_columns', '1' ~ 2b. I think you maybe need to include more some test combinations where there is and isn't a COLUMN LIST, because I am not 100% sure I understand the current logic/expectations for all combinations. e.g. When the generated column is in a column list but 'publish_generated_columns' is false then what should happen? etc. Also if there are any special rules then those should be mentioned in the commit message. ====== src/backend/replication/logical/proto.c 3. For all the API changes the new parameter name should be plural. /publish_generated_column/publish_generated_columns/ ~~~ 4. logical_rep_write_tuple: - if (att->attisdropped || att->attgenerated) + if (att->attisdropped) continue; - if (!column_in_column_list(att->attnum, columns)) + if (!column_in_column_list(att->attnum, columns) && !att->attgenerated) + continue; + + if (att->attgenerated && !publish_generated_column) continue; That code seems confusing. Shouldn't the logic be exactly as also in logicalrep_write_attrs()? e.g. Shouldn't they both look like this: if (att->attisdropped) continue; if (att->attgenerated && !publish_generated_column) continue; if (!column_in_column_list(att->attnum, columns)) continue; ====== src/backend/replication/pgoutput/pgoutput.c 5. static void send_relation_and_attrs(Relation relation, TransactionId xid, LogicalDecodingContext *ctx, - Bitmapset *columns); + Bitmapset *columns, + bool publish_generated_column); Use plural. /publish_generated_column/publish_generated_columns/ ~~~ 6. parse_output_parameters bool origin_option_given = false; + bool generate_column_option_given = false; data->binary = false; data->streaming = LOGICALREP_STREAM_OFF; data->messages = false; data->two_phase = false; + data->publish_generated_column = false; I think the 1st var should be 'include_generated_columns_option_given' for consistency with the name of the actual option that was given. ====== src/include/replication/logicalproto.h 7. (Same as a previous review comment) For all the API changes the new parameter name should be plural. /publish_generated_column/publish_generated_columns/ ====== src/include/replication/pgoutput.h 8. bool publish_no_origin; + bool publish_generated_column; } PGOutputData; /publish_generated_column/publish_generated_columns/ ====== Kind Regards, Peter Smith. Fujitsu Australia
Hi Kuroda-san, Thanks for reviewing the patch. I have fixed some of the comments > 2. > Currently, the option is implemented as streaming option. Are there any reasons > to choose the way? Another approach is to implement as slot option, like failover > and temporary. I think the current approach is appropriate. The options such as failover and temporary seem like properties of a slot and I think decoding of generated column should not be slot specific. Also adding a new option for slot may create an overhead. > 3. > You said that subscription option is not supported for now. Not sure, is it mean > that logical replication feature cannot be used for generated columns? If so, > the restriction won't be acceptable. If the combination between this and initial > sync is problematic, can't we exclude them in CreateSubscrition and AlterSubscription? > E.g., create_slot option cannot be set if slot_name is NONE. Added an option 'generated_column' for create subscription. Currently it allow to set 'generated_column' option as true only if 'copy_data' is set to false. Also we don't allow user to alter the 'generated_column' option. > 6. logicalrep_write_tuple() > > ``` > - if (!column_in_column_list(att->attnum, columns)) > + if (!column_in_column_list(att->attnum, columns) && !att->attgenerated) > + continue; > ``` > > Hmm, does above mean that generated columns are decoded even if they are not in > the column list? If so, why? I think such columns should not be sent. Fixed Thanks and Regards, Shlok Kyal
Вложения
Hi, On Wed, May 8, 2024 at 4:14 PM Shubham Khanna <khannashubham1197@gmail.com> wrote: > > On Wed, May 8, 2024 at 11:39 AM Rajendra Kumar Dangwal > <dangwalrajendra888@gmail.com> wrote: > > > > Hi PG Hackers. > > > > We are interested in enhancing the functionality of the pgoutput plugin by adding support for generated columns. > > Could you please guide us on the necessary steps to achieve this? Additionally, do you have a platform for tracking suchfeature requests? Any insights or assistance you can provide on this matter would be greatly appreciated. > > The attached patch has the changes to support capturing generated > column data using ‘pgoutput’ and’ test_decoding’ plugin. Now if the > ‘include_generated_columns’ option is specified, the generated column > information and generated column data also will be sent. As Euler mentioned earlier, I think it's a decision not to replicate generated columns because we don't know the target table on the subscriber has the same expression and there could be locale issues even if it looks the same. I can see that a benefit of this proposal would be to save cost to compute generated column values if the user wants the target table on the subscriber to have exactly the same data as the publisher's one. Are there other benefits or use cases? > > Usage from pgoutput plugin: > CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS AS > (a * 2) STORED); > CREATE publication pub1 for all tables; > SELECT 'init' FROM pg_create_logical_replication_slot('slot1', 'pgoutput'); > SELECT * FROM pg_logical_slot_peek_binary_changes('slot1', NULL, NULL, > 'proto_version', '1', 'publication_names', 'pub1', > 'include_generated_columns', 'true'); > > Usage from test_decoding plugin: > SELECT 'init' FROM pg_create_logical_replication_slot('slot2', 'test_decoding'); > CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS AS > (a * 2) STORED); > INSERT INTO gencoltable (a) VALUES (1), (2), (3); > SELECT data FROM pg_logical_slot_get_changes('slot2', NULL, NULL, > 'include-xids', '0', 'skip-empty-xacts', '1', > 'include_generated_columns', '1'); > > Currently it is not supported as a subscription option because table > sync for the generated column is not possible as copy command does not > support getting data for the generated column. If this feature is > required we can remove this limitation from the copy command and then > add it as a subscription option later. > Thoughts? I think that if we want to support an option to replicate generated columns, the initial tablesync should support it too. Otherwise, we end up filling the target columns data with NULL during the initial tablesync but with replicated data during the streaming changes. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
On Mon, 20 May 2024 at 13:49, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > Hi, > > On Wed, May 8, 2024 at 4:14 PM Shubham Khanna > <khannashubham1197@gmail.com> wrote: > > > > On Wed, May 8, 2024 at 11:39 AM Rajendra Kumar Dangwal > > <dangwalrajendra888@gmail.com> wrote: > > > > > > Hi PG Hackers. > > > > > > We are interested in enhancing the functionality of the pgoutput plugin by adding support for generated columns. > > > Could you please guide us on the necessary steps to achieve this? Additionally, do you have a platform for trackingsuch feature requests? Any insights or assistance you can provide on this matter would be greatly appreciated. > > > > The attached patch has the changes to support capturing generated > > column data using ‘pgoutput’ and’ test_decoding’ plugin. Now if the > > ‘include_generated_columns’ option is specified, the generated column > > information and generated column data also will be sent. > > As Euler mentioned earlier, I think it's a decision not to replicate > generated columns because we don't know the target table on the > subscriber has the same expression and there could be locale issues > even if it looks the same. I can see that a benefit of this proposal > would be to save cost to compute generated column values if the user > wants the target table on the subscriber to have exactly the same data > as the publisher's one. Are there other benefits or use cases? I think this will be useful mainly for the use cases where the publisher has generated columns and the subscriber does not have generated columns. In the case where both the publisher and subscriber have generated columns, the current patch will overwrite the generated column values based on the expression for the generated column in the subscriber. > > > > Usage from pgoutput plugin: > > CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS AS > > (a * 2) STORED); > > CREATE publication pub1 for all tables; > > SELECT 'init' FROM pg_create_logical_replication_slot('slot1', 'pgoutput'); > > SELECT * FROM pg_logical_slot_peek_binary_changes('slot1', NULL, NULL, > > 'proto_version', '1', 'publication_names', 'pub1', > > 'include_generated_columns', 'true'); > > > > Usage from test_decoding plugin: > > SELECT 'init' FROM pg_create_logical_replication_slot('slot2', 'test_decoding'); > > CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS AS > > (a * 2) STORED); > > INSERT INTO gencoltable (a) VALUES (1), (2), (3); > > SELECT data FROM pg_logical_slot_get_changes('slot2', NULL, NULL, > > 'include-xids', '0', 'skip-empty-xacts', '1', > > 'include_generated_columns', '1'); > > > > Currently it is not supported as a subscription option because table > > sync for the generated column is not possible as copy command does not > > support getting data for the generated column. If this feature is > > required we can remove this limitation from the copy command and then > > add it as a subscription option later. > > Thoughts? > > I think that if we want to support an option to replicate generated > columns, the initial tablesync should support it too. Otherwise, we > end up filling the target columns data with NULL during the initial > tablesync but with replicated data during the streaming changes. +1 for supporting initial sync. Currently copy_data = true and generate_column = true are not supported, this limitation will be removed in one of the upcoming patches. Regards, Vignesh
Hi, AFAICT this v2-0001 patch differences from v1 is mostly about adding the new CREATE SUBSCRIPTION option. Specifically, I don't think it is addressing any of my previous review comments for patch v1. [1]. So these comments below are limited only to the new option code; All my previous review comments probably still apply. ====== Commit message 1. (General) The commit message is seriously lacking background explanation to describe: - What is the current behaviour w.r.t. generated columns - What is the problem with the current behaviour? - What exactly is this patch doing to address that problem? ~ 2. New option generated_option is added in create subscription. Now if this option is specified as 'true' during create subscription, generated columns in the tables, present in publisher (to which this subscription is subscribed) can also be replicated. - 2A. "generated_option" is not the name of the new option. ~ 2B. "create subscription" stmt should be UPPERCASE; will also be more readable if the option name is quoted. ~ 2C. Needs more information like under what condition is this option ignored etc. ====== doc/src/sgml/ref/create_subscription.sgml 3. + <varlistentry id="sql-createsubscription-params-with-generated-column"> + <term><literal>generated-column</literal> (<type>boolean</type>)</term> + <listitem> + <para> + Specifies whether the generated columns present in the tables + associated with the subscription should be replicated. The default is + <literal>false</literal>. + </para> + + <para> + This parameter can only be set true if copy_data is set to false. + This option works fine when a generated column (in publisher) is replicated to a + non-generated column (in subscriber). Else if it is replicated to a generated + column, it will ignore the replicated data and fill the column with computed or + default data. + </para> + </listitem> + </varlistentry> 3A. There is a typo in the name "generated-column" because we should use underscores (not hyphens) for the option names. ~ 3B. This it is not a good option name because there is no verb so it doesn't mean anything to set it true/false -- actually there IS a verb "generate" but we are not saying generate = true/false, so this name is also quite confusing. I think "include_generated_columns" would be much better, but if others think that name is too long then maybe "include_generated_cols" or "include_gen_cols" or similar. Of course, whatever if the final decision should be propagated same thru all the code comments, params, fields, etc. ~ 3C. copy_data and false should be marked up as <literal> fonts in the sgml ~ 3D. Suggest re-word this part. Don't need to explain when it "works fine". BEFORE This option works fine when a generated column (in publisher) is replicated to a non-generated column (in subscriber). Else if it is replicated to a generated column, it will ignore the replicated data and fill the column with computed or default data. SUGGESTION If the subscriber-side column is also a generated column then this option has no effect; the replicated data will be ignored and the subscriber column will be filled as normal with the subscriber-side computed or default data. ====== src/backend/commands/subscriptioncmds.c 4. AlterSubscription SUBOPT_STREAMING | SUBOPT_DISABLE_ON_ERR | SUBOPT_PASSWORD_REQUIRED | SUBOPT_RUN_AS_OWNER | SUBOPT_FAILOVER | - SUBOPT_ORIGIN); + SUBOPT_ORIGIN | SUBOPT_GENERATED_COLUMN); Hmm. Is this correct? If ALTER is not allowed (later in this patch there is a message "toggling generated_column option is not allowed." then why are we even saying that SUBOPT_GENERATED_COLUMN is a support_opt for ALTER? ~~~ 5. + if (IsSet(opts.specified_opts, SUBOPT_GENERATED_COLUMN)) + { + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("toggling generated_column option is not allowed."))); + } 5A. I suspect this is not even needed if the 'supported_opt' is fixed per the previous comment. ~ 5B. But if this message is still needed then I think it should say "ALTER is not allowed" (not "toggling is not allowed") and also the option name should be quoted as per the new guidelines for error messages. ====== src/backend/replication/logical/proto.c 6. logicalrep_write_tuple - if (att->attisdropped || att->attgenerated) + if (att->attisdropped) continue; if (!column_in_column_list(att->attnum, columns)) continue; + if (att->attgenerated && !publish_generated_column) + Calling column_in_column_list() might be a more expensive operation than checking just generated columns flag so maybe reverse the order and check the generated columns first for a tiny performance gain. ~~ 7. - if (att->attisdropped || att->attgenerated) + if (att->attisdropped) continue; if (!column_in_column_list(att->attnum, columns)) continue; + if (att->attgenerated && !publish_generated_column) + continue; ditto #6 ~~ 8. logicalrep_write_attrs - if (att->attisdropped || att->attgenerated) + if (att->attisdropped) continue; if (!column_in_column_list(att->attnum, columns)) continue; + if (att->attgenerated && !publish_generated_column) + continue; + ditto #6 ~~ 9. - if (att->attisdropped || att->attgenerated) + if (att->attisdropped) continue; if (!column_in_column_list(att->attnum, columns)) continue; + if (att->attgenerated && !publish_generated_column) + continue; ditto #6 ====== src/include/catalog/pg_subscription.h 10. CATALOG + bool subgeneratedcolumn; /* True if generated colums must be published */ /colums/columns/ ====== src/test/regress/sql/publication.sql 11. --- error: generated column "d" can't be in list +-- ok Maybe change "ok" to say like "ok: generated cols can be in the list too" ====== 12. GENERAL - Missing CREATE SUBSCRIPTION test? GENERAL - Missing ALTER SUBSCRIPTION test? How come this patch adds a new CREATE SUBSCRIPTION option but does not seem to include any test case for that option in either the CREATE SUBSCRIPTION or ALTER SUBSCRIPTION regression tests? ====== [1] My v1 review - https://www.postgresql.org/message-id/CAHut+PsuJfcaeg6zst=6PE5uyJv_UxVRHU3ck7W2aHb1uQYKng@mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia
On 08.05.24 09:13, Shubham Khanna wrote: > The attached patch has the changes to support capturing generated > column data using ‘pgoutput’ and’ test_decoding’ plugin. Now if the > ‘include_generated_columns’ option is specified, the generated column > information and generated column data also will be sent. It might be worth keeping half an eye on the development of virtual generated columns [0]. I think it won't be possible to include those into the replication output stream. I think having an option for including stored generated columns is in general ok. [0]: https://www.postgresql.org/message-id/flat/a368248e-69e4-40be-9c07-6c3b5880b0a6@eisentraut.org
> Dear Shubham, > > Thanks for creating a patch! Here are high-level comments. > 1. > Please document the feature. If it is hard to describe, we should change the API. I have added the feature in the document. > 4. > Regarding the test_decoding plugin, it has already been able to decode the > generated columns. So... as the first place, is the proposed option really needed > for the plugin? Why do you include it? > If you anyway want to add the option, the default value should be on - which keeps > current behavior. I have made the generated column options as true for test_decoding plugin so by default we will send generated column data. > 5. > Assuming that the feature become usable used for logical replicaiton. Not sure, > should we change the protocol version at that time? Nodes prior than PG17 may > not want receive values for generated columns. Can we control only by the option? I verified the backward compatibility test by using the generated column option and it worked fine. I think there is no need to make any further changes. > 7. > > Some functions refer data->publish_generated_column many times. Can we store > the value to a variable? > > Below comments are for test_decoding part, but they may be not needed. > > ===== > > a. pg_decode_startup() > > ``` > + else if (strcmp(elem->defname, "include_generated_columns") == 0) > ``` > > Other options for test_decoding do not have underscore. It should be > "include-generated-columns". > > b. pg_decode_change() > > data->include_generated_columns is referred four times in the function. > Can you store the value to a varibable? > > > c. pg_decode_change() > > ``` > - true); > + true, data->include_generated_columns ); > ``` > > Please remove the blank. Fixed. The attached v3 Patch has the changes for the same. Thanks and Regards, Shubham Khanna.
Вложения
Dear Shubham, Thanks for updating the patch! I checked your patches briefly. Here are my comments. 01. API Since the option for test_decoding is enabled by default, I think it should be renamed. E.g., "skip-generated-columns" or something. 02. ddl.sql ``` +-- check include-generated-columns option with generated column +CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a * 2) STORED); +INSERT INTO gencoltable (a) VALUES (1), (2), (3); +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1','include-generated-columns', '1'); + data +------------------------------------------------------------- + BEGIN + table public.gencoltable: INSERT: a[integer]:1 b[integer]:2 + table public.gencoltable: INSERT: a[integer]:2 b[integer]:4 + table public.gencoltable: INSERT: a[integer]:3 b[integer]:6 + COMMIT +(5 rows) ``` We should test non-default case, which the generated columns are not generated. 03. ddl.sql Not sure new tests are in the correct place. Do we have to add new file and move tests to it? Thought? 04. protocol.sgml Please keep the format of the sgml file. 05. protocol.sgml The option is implemented as the streaming option of pgoutput plugin, so they should be located under "Logical Streaming Replication Parameters" section. 05. AlterSubscription ``` + if (IsSet(opts.specified_opts, SUBOPT_GENERATED_COLUMN)) + { + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("toggling generated_column option is not allowed."))); + } ``` If you don't want to support the option, you can remove SUBOPT_GENERATED_COLUMN macro from the function. But can you clarify the reason why you do not want? 07. logicalrep_write_tuple ``` - if (!column_in_column_list(att->attnum, columns)) + if (!column_in_column_list(att->attnum, columns) && !att->attgenerated) + continue; + + if (att->attgenerated && !publish_generated_column) continue; ``` I think changes in v2 was reverted or wrongly merged. 08. test code Can you add tests that generated columns are replicated by the logical replication? Best Regards, Hayato Kuroda FUJITSU LIMITED https://www.fujitsu.com/
On Thu, 23 May 2024 at 09:19, Shubham Khanna <khannashubham1197@gmail.com> wrote: > > > Dear Shubham, > > > > Thanks for creating a patch! Here are high-level comments. > > > 1. > > Please document the feature. If it is hard to describe, we should change the API. > > I have added the feature in the document. > > > 4. > > Regarding the test_decoding plugin, it has already been able to decode the > > generated columns. So... as the first place, is the proposed option really needed > > for the plugin? Why do you include it? > > If you anyway want to add the option, the default value should be on - which keeps > > current behavior. > > I have made the generated column options as true for test_decoding > plugin so by default we will send generated column data. > > > 5. > > Assuming that the feature become usable used for logical replicaiton. Not sure, > > should we change the protocol version at that time? Nodes prior than PG17 may > > not want receive values for generated columns. Can we control only by the option? > > I verified the backward compatibility test by using the generated > column option and it worked fine. I think there is no need to make any > further changes. > > > 7. > > > > Some functions refer data->publish_generated_column many times. Can we store > > the value to a variable? > > > > Below comments are for test_decoding part, but they may be not needed. > > > > ===== > > > > a. pg_decode_startup() > > > > ``` > > + else if (strcmp(elem->defname, "include_generated_columns") == 0) > > ``` > > > > Other options for test_decoding do not have underscore. It should be > > "include-generated-columns". > > > > b. pg_decode_change() > > > > data->include_generated_columns is referred four times in the function. > > Can you store the value to a varibable? > > > > > > c. pg_decode_change() > > > > ``` > > - true); > > + true, data->include_generated_columns ); > > ``` > > > > Please remove the blank. > > Fixed. > The attached v3 Patch has the changes for the same. Few comments: 1) Since this is removed, tupdesc variable is not required anymore: +++ b/src/backend/catalog/pg_publication.c @@ -534,12 +534,6 @@ publication_translate_columns(Relation targetrel, List *columns, errmsg("cannot use system column \"%s\" in publication column list", colname)); - if (TupleDescAttr(tupdesc, attnum - 1)->attgenerated) - ereport(ERROR, - errcode(ERRCODE_INVALID_COLUMN_REFERENCE), - errmsg("cannot use generated column \"%s\" in publication column list", - colname)); 2) In test_decoding include_generated_columns option is used: + else if (strcmp(elem->defname, "include_generated_columns") == 0) + { + if (elem->arg == NULL) + continue; + else if (!parse_bool(strVal(elem->arg), &data->include_generated_columns)) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("could not parse value \"%s\" for parameter \"%s\"", + strVal(elem->arg), elem->defname))); + } In subscription we have used generated_column, we can try to use the same option in both places: + else if (IsSet(supported_opts, SUBOPT_GENERATED_COLUMN) && + strcmp(defel->defname, "generated_column") == 0) + { + if (IsSet(opts->specified_opts, SUBOPT_GENERATED_COLUMN)) + errorConflictingDefElem(defel, pstate); + + opts->specified_opts |= SUBOPT_GENERATED_COLUMN; + opts->generated_column = defGetBoolean(defel); + } 3) Tab completion can be added for create subscription to include generated_column option 4) There are few whitespace issues while applying the patch, check for git diff --check 5) Add few tests for the new option added Regards, Vignesh
Hi, Here are some review comments for the patch v3-0001. I don't think v3 addressed any of my previous review comments for patches v1 and v2. [1][2] So the comments below are limited only to the new code (i.e. the v3 versus v2 differences). Meanwhile, all my previous review comments may still apply. ====== GENERAL The patch applied gives whitespace warnings: [postgres@CentOS7-x64 oss_postgres_misc]$ git apply ../patches_misc/v3-0001-Support-generated-column-capturing-generated-colu.patch ../patches_misc/v3-0001-Support-generated-column-capturing-generated-colu.patch:150: trailing whitespace. ../patches_misc/v3-0001-Support-generated-column-capturing-generated-colu.patch:202: trailing whitespace. ../patches_misc/v3-0001-Support-generated-column-capturing-generated-colu.patch:730: trailing whitespace. warning: 3 lines add whitespace errors. ====== contrib/test_decoding/test_decoding.c 1. pg_decode_change MemoryContext old; + bool include_generated_columns; + I'm not really convinced this variable saves any code. ====== doc/src/sgml/protocol.sgml 2. + <varlistentry> + <term><replaceable class="parameter">include-generated-columns</replaceable></term> + <listitem> + <para> + The include-generated-columns option controls whether generated columns should be included in the string representation of tuples during logical decoding in PostgreSQL. This allows users to customize the output format based on whether they want to include these columns or not. + </para> + </listitem> + </varlistentry> 2a. Something is not correct when this name has hyphens and all the nearby parameter names do not. Shouldn't it be all uppercase like the other boolean parameter? ~ 2b. Text in the SGML file should be wrapped properly. ~ 2c. IMO the comment can be more terse and it also needs to specify that it is a boolean type, and what is the default value if not passed. SUGGESTION INCLUDE_GENERATED_COLUMNS [ boolean ] If true, then generated columns should be included in the string representation of tuples during logical decoding in PostgreSQL. The default is false. ====== src/backend/replication/logical/proto.c 3. logicalrep_write_tuple - if (!column_in_column_list(att->attnum, columns)) + if (!column_in_column_list(att->attnum, columns) && !att->attgenerated) + continue; + + if (att->attgenerated && !publish_generated_column) continue; 3a. This code seems overcomplicated checking the same flag multiple times. SUGGESTION if (att->attgenerated) { if (!publish_generated_column) continue; } else { if (!column_in_column_list(att->attnum, columns)) continue; } ~ 3b. The same logic occurs several times in logicalrep_write_tuple ~~~ 4. logicalrep_write_attrs if (!column_in_column_list(att->attnum, columns)) continue; + if (att->attgenerated && !publish_generated_column) + continue; + Shouldn't these code fragments (2x in this function) look the same as in logicalrep_write_tuple? See the above review comments. ====== src/backend/replication/pgoutput/pgoutput.c 5. maybe_send_schema TransactionId topxid = InvalidTransactionId; + bool publish_generated_column = data->publish_generated_column; I'm not convinced this saves any code, and anyway, it is not consistent with other fields in this function that are not extracted to another variable (e.g. data->streaming). ~~~ 6. pgoutput_change - + bool publish_generated_column = data->publish_generated_column; + I'm not convinced this saves any code, and anyway, it is not consistent with other fields in this function that are not extracted to another variable (e.g. data->binary). ====== [1] My v1 review - https://www.postgresql.org/message-id/CAHut+PsuJfcaeg6zst=6PE5uyJv_UxVRHU3ck7W2aHb1uQYKng@mail.gmail.com [2] My v2 review - https://www.postgresql.org/message-id/CAHut%2BPv4RpOsUgkEaXDX%3DW2rhHAsJLiMWdUrUGZOcoRHuWj5%2BQ%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia