Обсуждение: Pgoutput not capturing the generated columns

Поиск
Список
Период
Сортировка

Pgoutput not capturing the generated columns

От
Rajendra Kumar Dangwal
Дата:
Hi PG Users.

We are using Debezium to capture the CDC events into Kafka.
With decoderbufs and wal2json plugins the connector is able to capture the generated columns in the table but not with
pgoutputplugin. 

We tested with the following example:

CREATE TABLE employees (
   id SERIAL PRIMARY KEY,
   first_name VARCHAR(50),
   last_name VARCHAR(50),
   full_name VARCHAR(100) GENERATED ALWAYS AS (first_name || ' ' || last_name) STORED
);

// Inserted few records when the connector was running

Insert into employees (first_name, last_name) VALUES ('ABC' , 'XYZ’);


With decoderbufs and wal2json the connector is able to capture the generated column `full_name` in above example. But
withpgoutput the generated column was not captured.  
Is this a known limitation of pgoutput plugin? If yes, where can we request to add support for this feature?

Thanks.
Rajendra.


Re: Pgoutput not capturing the generated columns

От
"Euler Taveira"
Дата:
On Tue, Aug 1, 2023, at 3:47 AM, Rajendra Kumar Dangwal wrote:
With decoderbufs and wal2json the connector is able to capture the generated column `full_name` in above example. But with pgoutput the generated column was not captured. 

wal2json materializes the generated columns before delivering the output. I
decided to materialized the generated columns in the output plugin because the
target consumers expects a complete row.

Is this a known limitation of pgoutput plugin? If yes, where can we request to add support for this feature?

I wouldn't say limitation but a design decision.

The logical replication design decides to compute the generated columns at
subscriber side. It was a wise decision aiming optimization (it doesn't
overload the publisher that is *already* in charge of logical decoding).

Should pgoutput provide a complete row? Probably. If it is an option that
defaults to false and doesn't impact performance.

The request for features should be done in this mailing list.


--
Euler Taveira

Re: Pgoutput not capturing the generated columns

От
Rajendra Kumar Dangwal
Дата:

Thanks Euler,

Greatly appreciate your inputs.


> Should pgoutput provide a complete row? Probably. If it is an option that defaults to false and doesn't impact performance.


Yes, it would be great if this feature can be implemented.


> The logical replication design decides to compute the generated columns at subscriber side.


If I understand correctly, this approach involves establishing a function on the subscriber's side that emulates the operation executed to derive the generated column values.

If yes, I see one potential issue where disparities might surface between the values of generated columns on the subscriber's side and those computed within Postgres. This could happen if the generated column's value relies on the current_time function.

Please let me know how can we track the feature requests and the discussions around that.

Thanks,
Rajendra.

Re: Pgoutput not capturing the generated columns

От
Rajendra Kumar Dangwal
Дата:
Hi PG Hackers.

We are interested in enhancing the functionality of the pgoutput plugin by adding support for generated columns.
Could you please guide us on the necessary steps to achieve this? Additionally, do you have a platform for tracking
suchfeature requests? Any insights or assistance you can provide on this matter would be greatly appreciated. 

Many thanks.
Rajendra.




Re: Pgoutput not capturing the generated columns

От
Shubham Khanna
Дата:
On Wed, May 8, 2024 at 11:39 AM Rajendra Kumar Dangwal
<dangwalrajendra888@gmail.com> wrote:
>
> Hi PG Hackers.
>
> We are interested in enhancing the functionality of the pgoutput plugin by adding support for generated columns.
> Could you please guide us on the necessary steps to achieve this? Additionally, do you have a platform for tracking
suchfeature requests? Any insights or assistance you can provide on this matter would be greatly appreciated. 

The attached patch has the changes to support capturing generated
column data using ‘pgoutput’ and’ test_decoding’ plugin. Now if the
‘include_generated_columns’ option is specified, the generated column
information and generated column data also will be sent.

Usage from pgoutput plugin:
CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS AS
(a * 2) STORED);
CREATE publication pub1 for all tables;
SELECT 'init' FROM pg_create_logical_replication_slot('slot1', 'pgoutput');
SELECT * FROM pg_logical_slot_peek_binary_changes('slot1', NULL, NULL,
'proto_version', '1', 'publication_names', 'pub1',
'include_generated_columns', 'true');

Usage from test_decoding plugin:
SELECT 'init' FROM pg_create_logical_replication_slot('slot2', 'test_decoding');
CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS AS
(a * 2) STORED);
INSERT INTO gencoltable (a) VALUES (1), (2), (3);
SELECT data FROM pg_logical_slot_get_changes('slot2', NULL, NULL,
'include-xids', '0', 'skip-empty-xacts', '1',
'include_generated_columns', '1');

Currently it is not supported as a subscription option because table
sync for the generated column is not possible as copy command does not
support getting data for the generated column. If this feature is
required we can remove this limitation from the copy command and then
add it as a subscription option later.
Thoughts?

Thanks and Regards,
Shubham Khanna.

Вложения

RE: Pgoutput not capturing the generated columns

От
"Hayato Kuroda (Fujitsu)"
Дата:
Dear Shubham,

Thanks for creating a patch! Here are high-level comments.

1.
Please document the feature. If it is hard to describe, we should change the API.

2.
Currently, the option is implemented as streaming option. Are there any reasons
to choose the way? Another approach is to implement as slot option, like failover
and temporary.

3.
You said that subscription option is not supported for now. Not sure, is it mean
that logical replication feature cannot be used for generated columns? If so,
the restriction won't be acceptable. If the combination between this and initial
sync is problematic, can't we exclude them in CreateSubscrition and AlterSubscription?
E.g., create_slot option cannot be set if slot_name is NONE.

4.
Regarding the test_decoding plugin, it has already been able to decode the
generated columns. So... as the first place, is the proposed option really needed
for the plugin? Why do you include it?
If you anyway want to add the option, the default value should be on - which keeps
current behavior.

5.
Assuming that the feature become usable used for logical replicaiton. Not sure,
should we change the protocol version at that time? Nodes prior than PG17 may
not want receive values for generated columns. Can we control only by the option?

6. logicalrep_write_tuple()

```
-        if (!column_in_column_list(att->attnum, columns))
+        if (!column_in_column_list(att->attnum, columns) && !att->attgenerated)
+            continue;
```

Hmm, does above mean that generated columns are decoded even if they are not in
the column list? If so, why? I think such columns should not be sent.

7.

Some functions refer data->publish_generated_column many times. Can we store
the value to a variable?

Below comments are for test_decoding part, but they may be not needed.

=====

a. pg_decode_startup()

```
+        else if (strcmp(elem->defname, "include_generated_columns") == 0)
```

Other options for test_decoding do not have underscore. It should be
"include-generated-columns".

b. pg_decode_change()

data->include_generated_columns is referred four times in the function.
Can you store the value to a varibable?


c. pg_decode_change()

```
-                                    true);
+                                    true, data->include_generated_columns );
```

Please remove the blank.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 


Re: Pgoutput not capturing the generated columns

От
Peter Smith
Дата:
Here are some review comments for the patch v1-0001.

======
GENERAL

G.1. Use consistent names

It seems to add unnecessary complications by having different names
for all the new options, fields and API parameters.

e.g. sometimes 'include_generated_columns'
e.g. sometimes 'publish_generated_columns'

Won't it be better to just use identical names everywhere for
everything? I don't mind which one you choose; I just felt you only
need one name, not two. This comment overrides everything else in this
post so whatever name you choose, make adjustments for all my other
review comments as necessary.

======

G.2. Is it possible to just use the existing bms?

A very large part of this patch is adding more API parameters to
delegate the 'publish_generated_columns' flag value down to when it is
finally checked and used. e.g.

The functions:
- logicalrep_write_insert(), logicalrep_write_update(),
logicalrep_write_delete()
... are delegating the new parameter 'publish_generated_column' down to:
- logicalrep_write_tuple

The functions:
- logicalrep_write_rel()
... are delegating the new parameter 'publish_generated_column' down to:
- logicalrep_write_attrs

AFAICT in all these places the API is already passing a "Bitmapset
*columns". I was wondering if it might be possible to modify the
"Bitmapset *columns" BEFORE any of those functions get called so that
the "columns" BMS either does or doesn't include generated cols (as
appropriate according to the option).

Well, it might not be so simple because there are some NULL BMS
considerations also, but I think it would be worth investigating at
least, because if indeed you can find some common place (somewhere
like pgoutput_change()?) where the columns BMS can be filtered to
remove bits for generated cols then it could mean none of those other
patch API changes are needed at all -- then the patch would only be
1/2 the size.

======
Commit message

1.
Now if include_generated_columns option is specified, the generated
column information and generated column data also will be sent.

Usage from pgoutput plugin:
SELECT * FROM pg_logical_slot_peek_binary_changes('slot1', NULL, NULL,
'proto_version', '1', 'publication_names', 'pub1',
'include_generated_columns', 'true');

Usage from test_decoding plugin:
SELECT data FROM pg_logical_slot_get_changes('slot2', NULL, NULL,
'include-xids', '0', 'skip-empty-xacts', '1',
'include_generated_columns', '1');

~

I think there needs to be more background information given here. This
commit message doesn't seem to describe anything about what is the
problem and how this patch fixes it. It just jumps straight into
giving usages of a 'include_generated_columns' option.

It also doesn't say that this is an option that was newly *introduced*
by the patch -- it refers to it as though the reader should already
know about it.

Furthermore, your hacker's post says "Currently it is not supported as
a subscription option because table sync for the generated column is
not possible as copy command does not support getting data for the
generated column. If this feature is required we can remove this
limitation from the copy command and then add it as a subscription
option later." IMO that all seems like the kind of information that
ought to also be mentioned in this commit message.

======
contrib/test_decoding/sql/ddl.sql

2.
+-- check include_generated_columns option with generated column
+CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS
AS (a * 2) STORED);
+INSERT INTO gencoltable (a) VALUES (1), (2), (3);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
'include_generated_columns', '1');
+DROP TABLE gencoltable;
+

2a.
Perhaps you should include both option values to demonstrate the
difference in behaviour:

'include_generated_columns', '0'
'include_generated_columns', '1'

~

2b.
I think you maybe need to include more some test combinations where
there is and isn't a COLUMN LIST, because I am not 100% sure I
understand the current logic/expectations for all combinations.

e.g. When the generated column is in a column list but
'publish_generated_columns' is false then what should happen? etc.
Also if there are any special rules then those should be mentioned in
the commit message.

======
src/backend/replication/logical/proto.c

3.
For all the API changes the new parameter name should be plural.

/publish_generated_column/publish_generated_columns/

~~~

4. logical_rep_write_tuple:

- if (att->attisdropped || att->attgenerated)
+ if (att->attisdropped)
  continue;

- if (!column_in_column_list(att->attnum, columns))
+ if (!column_in_column_list(att->attnum, columns) && !att->attgenerated)
+ continue;
+
+ if (att->attgenerated && !publish_generated_column)
  continue;
That code seems confusing. Shouldn't the logic be exactly as also in
logicalrep_write_attrs()?

e.g. Shouldn't they both look like this:

if (att->attisdropped)
  continue;

if (att->attgenerated && !publish_generated_column)
  continue;

if (!column_in_column_list(att->attnum, columns))
  continue;
======
src/backend/replication/pgoutput/pgoutput.c

5.
 static void send_relation_and_attrs(Relation relation, TransactionId xid,
  LogicalDecodingContext *ctx,
- Bitmapset *columns);
+ Bitmapset *columns,
+ bool publish_generated_column);

Use plural. /publish_generated_column/publish_generated_columns/

~~~

6. parse_output_parameters

  bool origin_option_given = false;
+ bool generate_column_option_given = false;

  data->binary = false;
  data->streaming = LOGICALREP_STREAM_OFF;
  data->messages = false;
  data->two_phase = false;
+ data->publish_generated_column = false;

I think the 1st var should be 'include_generated_columns_option_given'
for consistency with the name of the actual option that was given.

======
src/include/replication/logicalproto.h

7.
(Same as a previous review comment)

For all the API changes the new parameter name should be plural.

/publish_generated_column/publish_generated_columns/

======
src/include/replication/pgoutput.h

8.
  bool publish_no_origin;
+ bool publish_generated_column;
 } PGOutputData;

/publish_generated_column/publish_generated_columns/

======
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Pgoutput not capturing the generated columns

От
Shlok Kyal
Дата:
Hi Kuroda-san,

Thanks for reviewing the patch. I have fixed some of the comments
> 2.
> Currently, the option is implemented as streaming option. Are there any reasons
> to choose the way? Another approach is to implement as slot option, like failover
> and temporary.
I think the current approach is appropriate. The options such as
failover and temporary seem like properties of a slot and I think
decoding of generated column should not be slot specific. Also adding
a new option for slot may create an overhead.

> 3.
> You said that subscription option is not supported for now. Not sure, is it mean
> that logical replication feature cannot be used for generated columns? If so,
> the restriction won't be acceptable. If the combination between this and initial
> sync is problematic, can't we exclude them in CreateSubscrition and AlterSubscription?
> E.g., create_slot option cannot be set if slot_name is NONE.
Added an option 'generated_column' for create subscription. Currently
it allow to set 'generated_column' option as true only if 'copy_data'
is set to false.
Also we don't allow user to alter the 'generated_column' option.

> 6. logicalrep_write_tuple()
>
> ```
> -        if (!column_in_column_list(att->attnum, columns))
> +        if (!column_in_column_list(att->attnum, columns) && !att->attgenerated)
> +            continue;
> ```
>
> Hmm, does above mean that generated columns are decoded even if they are not in
> the column list? If so, why? I think such columns should not be sent.
Fixed

Thanks and Regards,
Shlok Kyal

Вложения

Re: Pgoutput not capturing the generated columns

От
Masahiko Sawada
Дата:
Hi,

On Wed, May 8, 2024 at 4:14 PM Shubham Khanna
<khannashubham1197@gmail.com> wrote:
>
> On Wed, May 8, 2024 at 11:39 AM Rajendra Kumar Dangwal
> <dangwalrajendra888@gmail.com> wrote:
> >
> > Hi PG Hackers.
> >
> > We are interested in enhancing the functionality of the pgoutput plugin by adding support for generated columns.
> > Could you please guide us on the necessary steps to achieve this? Additionally, do you have a platform for tracking
suchfeature requests? Any insights or assistance you can provide on this matter would be greatly appreciated. 
>
> The attached patch has the changes to support capturing generated
> column data using ‘pgoutput’ and’ test_decoding’ plugin. Now if the
> ‘include_generated_columns’ option is specified, the generated column
> information and generated column data also will be sent.

As Euler mentioned earlier, I think it's a decision not to replicate
generated columns because we don't know the target table on the
subscriber has the same expression and there could be locale issues
even if it looks the same. I can see that a benefit of this proposal
would be to save cost to compute generated column values if the user
wants the target table on the subscriber to have exactly the same data
as the publisher's one. Are there other benefits or use cases?

>
> Usage from pgoutput plugin:
> CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS AS
> (a * 2) STORED);
> CREATE publication pub1 for all tables;
> SELECT 'init' FROM pg_create_logical_replication_slot('slot1', 'pgoutput');
> SELECT * FROM pg_logical_slot_peek_binary_changes('slot1', NULL, NULL,
> 'proto_version', '1', 'publication_names', 'pub1',
> 'include_generated_columns', 'true');
>
> Usage from test_decoding plugin:
> SELECT 'init' FROM pg_create_logical_replication_slot('slot2', 'test_decoding');
> CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS AS
> (a * 2) STORED);
> INSERT INTO gencoltable (a) VALUES (1), (2), (3);
> SELECT data FROM pg_logical_slot_get_changes('slot2', NULL, NULL,
> 'include-xids', '0', 'skip-empty-xacts', '1',
> 'include_generated_columns', '1');
>
> Currently it is not supported as a subscription option because table
> sync for the generated column is not possible as copy command does not
> support getting data for the generated column. If this feature is
> required we can remove this limitation from the copy command and then
> add it as a subscription option later.
> Thoughts?

I think that if we want to support an option to replicate generated
columns, the initial tablesync should support it too. Otherwise, we
end up filling the target columns data with NULL during the initial
tablesync but with replicated data during the streaming changes.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: Pgoutput not capturing the generated columns

От
vignesh C
Дата:
On Mon, 20 May 2024 at 13:49, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> Hi,
>
> On Wed, May 8, 2024 at 4:14 PM Shubham Khanna
> <khannashubham1197@gmail.com> wrote:
> >
> > On Wed, May 8, 2024 at 11:39 AM Rajendra Kumar Dangwal
> > <dangwalrajendra888@gmail.com> wrote:
> > >
> > > Hi PG Hackers.
> > >
> > > We are interested in enhancing the functionality of the pgoutput plugin by adding support for generated columns.
> > > Could you please guide us on the necessary steps to achieve this? Additionally, do you have a platform for
trackingsuch feature requests? Any insights or assistance you can provide on this matter would be greatly appreciated. 
> >
> > The attached patch has the changes to support capturing generated
> > column data using ‘pgoutput’ and’ test_decoding’ plugin. Now if the
> > ‘include_generated_columns’ option is specified, the generated column
> > information and generated column data also will be sent.
>
> As Euler mentioned earlier, I think it's a decision not to replicate
> generated columns because we don't know the target table on the
> subscriber has the same expression and there could be locale issues
> even if it looks the same. I can see that a benefit of this proposal
> would be to save cost to compute generated column values if the user
> wants the target table on the subscriber to have exactly the same data
> as the publisher's one. Are there other benefits or use cases?

I think this will be useful mainly for the use cases where the
publisher has generated columns and the subscriber does not have
generated  columns.
In the case where both the publisher and subscriber have generated
columns, the current patch will overwrite the generated column values
based on the expression for the generated column in the subscriber.

> >
> > Usage from pgoutput plugin:
> > CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS AS
> > (a * 2) STORED);
> > CREATE publication pub1 for all tables;
> > SELECT 'init' FROM pg_create_logical_replication_slot('slot1', 'pgoutput');
> > SELECT * FROM pg_logical_slot_peek_binary_changes('slot1', NULL, NULL,
> > 'proto_version', '1', 'publication_names', 'pub1',
> > 'include_generated_columns', 'true');
> >
> > Usage from test_decoding plugin:
> > SELECT 'init' FROM pg_create_logical_replication_slot('slot2', 'test_decoding');
> > CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS AS
> > (a * 2) STORED);
> > INSERT INTO gencoltable (a) VALUES (1), (2), (3);
> > SELECT data FROM pg_logical_slot_get_changes('slot2', NULL, NULL,
> > 'include-xids', '0', 'skip-empty-xacts', '1',
> > 'include_generated_columns', '1');
> >
> > Currently it is not supported as a subscription option because table
> > sync for the generated column is not possible as copy command does not
> > support getting data for the generated column. If this feature is
> > required we can remove this limitation from the copy command and then
> > add it as a subscription option later.
> > Thoughts?
>
> I think that if we want to support an option to replicate generated
> columns, the initial tablesync should support it too. Otherwise, we
> end up filling the target columns data with NULL during the initial
> tablesync but with replicated data during the streaming changes.

+1 for supporting initial sync.
Currently copy_data = true and generate_column = true are not
supported, this limitation will be removed in one of the upcoming
patches.

Regards,
Vignesh



Re: Pgoutput not capturing the generated columns

От
Peter Smith
Дата:
Hi,

AFAICT this v2-0001 patch differences from v1 is mostly about adding
the new CREATE SUBSCRIPTION option. Specifically, I don't think it is
addressing any of my previous review comments for patch v1. [1]. So
these comments below are limited only to the new option code; All my
previous review comments probably still apply.

======
Commit message

1. (General)
The commit message is seriously lacking background explanation to describe:
- What is the current behaviour w.r.t. generated columns
- What is the problem with the current behaviour?
- What exactly is this patch doing to address that problem?

~

2.
New option generated_option is added in create subscription. Now if this
option is specified as 'true' during create subscription, generated
columns in the tables, present in publisher (to which this subscription is
subscribed) can also be replicated.

-

2A.
"generated_option" is not the name of the new option.

~

2B.
"create subscription" stmt should be UPPERCASE; will also be more
readable if the option name is quoted.

~

2C.
Needs more information like under what condition is this option ignored etc.

======
doc/src/sgml/ref/create_subscription.sgml

3.
+       <varlistentry id="sql-createsubscription-params-with-generated-column">
+        <term><literal>generated-column</literal> (<type>boolean</type>)</term>
+        <listitem>
+         <para>
+          Specifies whether the generated columns present in the tables
+          associated with the subscription should be replicated. The default is
+          <literal>false</literal>.
+         </para>
+
+         <para>
+          This parameter can only be set true if copy_data is set to false.
+          This option works fine when a generated column (in
publisher) is replicated to a
+          non-generated column (in subscriber). Else if it is
replicated to a generated
+          column, it will ignore the replicated data and fill the
column with computed or
+          default data.
+         </para>
+        </listitem>
+       </varlistentry>

3A.
There is a typo in the name "generated-column" because we should use
underscores (not hyphens) for the option names.

~

3B.
This it is not a good option name because there is no verb so it
doesn't mean anything to set it true/false -- actually there IS a verb
"generate" but we are not saying generate = true/false, so this name
is also quite confusing.

I think "include_generated_columns" would be much better, but if
others think that name is too long then maybe "include_generated_cols"
or "include_gen_cols" or similar. Of course, whatever if the final
decision should be propagated same thru all the code comments, params,
fields, etc.

~

3C.
copy_data and false should be marked up as <literal> fonts in the sgml

~

3D.

Suggest re-word this part. Don't need to explain when it "works fine".

BEFORE
This option works fine when a generated column (in publisher) is
replicated to a non-generated column (in subscriber). Else if it is
replicated to a generated column, it will ignore the replicated data
and fill the column with computed or default data.

SUGGESTION
If the subscriber-side column is also a generated column then this
option has no effect; the replicated data will be ignored and the
subscriber column will be filled as normal with the subscriber-side
computed or default data.

======
src/backend/commands/subscriptioncmds.c

4. AlterSubscription
    SUBOPT_STREAMING | SUBOPT_DISABLE_ON_ERR |
    SUBOPT_PASSWORD_REQUIRED |
    SUBOPT_RUN_AS_OWNER | SUBOPT_FAILOVER |
-   SUBOPT_ORIGIN);
+   SUBOPT_ORIGIN | SUBOPT_GENERATED_COLUMN);

Hmm. Is this correct? If ALTER is not allowed (later in this patch
there is a message "toggling generated_column option is not allowed."
then why are we even saying that SUBOPT_GENERATED_COLUMN is a
support_opt for ALTER?

~~~

5.
+ if (IsSet(opts.specified_opts, SUBOPT_GENERATED_COLUMN))
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("toggling generated_column option is not allowed.")));
+ }

5A.
I suspect this is not even needed if the 'supported_opt' is fixed per
the previous comment.

~

5B.
But if this message is still needed then I think it should say "ALTER
is not allowed" (not "toggling is not allowed") and also the option
name should be quoted as per the new guidelines for error messages.

======
src/backend/replication/logical/proto.c


6. logicalrep_write_tuple

- if (att->attisdropped || att->attgenerated)
+ if (att->attisdropped)
  continue;

  if (!column_in_column_list(att->attnum, columns))
  continue;

+ if (att->attgenerated && !publish_generated_column)
+

Calling column_in_column_list() might be a more expensive operation
than checking just generated columns flag so maybe reverse the order
and check the generated columns first for a tiny performance gain.

~~

7.
- if (att->attisdropped || att->attgenerated)
+ if (att->attisdropped)
  continue;

  if (!column_in_column_list(att->attnum, columns))
  continue;

+ if (att->attgenerated && !publish_generated_column)
+ continue;

ditto #6

~~

8. logicalrep_write_attrs

- if (att->attisdropped || att->attgenerated)
+ if (att->attisdropped)
  continue;

  if (!column_in_column_list(att->attnum, columns))
  continue;

+ if (att->attgenerated && !publish_generated_column)
+ continue;
+

ditto #6

~~

9.
- if (att->attisdropped || att->attgenerated)
+ if (att->attisdropped)
  continue;

  if (!column_in_column_list(att->attnum, columns))
  continue;

+ if (att->attgenerated && !publish_generated_column)
+ continue;

ditto #6

======
src/include/catalog/pg_subscription.h


10. CATALOG

+ bool subgeneratedcolumn; /* True if generated colums must be published */

/colums/columns/

======
src/test/regress/sql/publication.sql

11.
--- error: generated column "d" can't be in list
+-- ok


Maybe change "ok" to say like "ok: generated cols can be in the list too"

======

12.
GENERAL - Missing CREATE SUBSCRIPTION test?
GENERAL - Missing ALTER SUBSCRIPTION test?

How come this patch adds a new CREATE SUBSCRIPTION option but does not
seem to include any test case for that option in either the CREATE
SUBSCRIPTION or ALTER SUBSCRIPTION regression tests?

======
[1] My v1 review -
https://www.postgresql.org/message-id/CAHut+PsuJfcaeg6zst=6PE5uyJv_UxVRHU3ck7W2aHb1uQYKng@mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Pgoutput not capturing the generated columns

От
Peter Eisentraut
Дата:
On 08.05.24 09:13, Shubham Khanna wrote:
> The attached patch has the changes to support capturing generated
> column data using ‘pgoutput’ and’ test_decoding’ plugin. Now if the
> ‘include_generated_columns’ option is specified, the generated column
> information and generated column data also will be sent.

It might be worth keeping half an eye on the development of virtual 
generated columns [0].  I think it won't be possible to include those 
into the replication output stream.

I think having an option for including stored generated columns is in 
general ok.


[0]: 
https://www.postgresql.org/message-id/flat/a368248e-69e4-40be-9c07-6c3b5880b0a6@eisentraut.org



Re: Pgoutput not capturing the generated columns

От
Shubham Khanna
Дата:
> Dear Shubham,
>
> Thanks for creating a patch! Here are high-level comments.

> 1.
> Please document the feature. If it is hard to describe, we should change the API.

I have added the feature in the document.

> 4.
> Regarding the test_decoding plugin, it has already been able to decode the
> generated columns. So... as the first place, is the proposed option really needed
> for the plugin? Why do you include it?
> If you anyway want to add the option, the default value should be on - which keeps
> current behavior.

I have made the generated column options as true for test_decoding
plugin so by default we will send generated column data.

> 5.
> Assuming that the feature become usable used for logical replicaiton. Not sure,
> should we change the protocol version at that time? Nodes prior than PG17 may
> not want receive values for generated columns. Can we control only by the option?

I verified the backward compatibility test by using the generated
column option and it worked fine. I think there is no need to make any
further changes.

> 7.
>
> Some functions refer data->publish_generated_column many times. Can we store
> the value to a variable?
>
> Below comments are for test_decoding part, but they may be not needed.
>
> =====
>
> a. pg_decode_startup()
>
> ```
> +        else if (strcmp(elem->defname, "include_generated_columns") == 0)
> ```
>
> Other options for test_decoding do not have underscore. It should be
> "include-generated-columns".
>
> b. pg_decode_change()
>
> data->include_generated_columns is referred four times in the function.
> Can you store the value to a varibable?
>
>
> c. pg_decode_change()
>
> ```
> -                                    true);
> +                                    true, data->include_generated_columns );
> ```
>
> Please remove the blank.

Fixed.
The attached v3 Patch has the changes for the same.

Thanks and Regards,
Shubham Khanna.

Вложения

RE: Pgoutput not capturing the generated columns

От
"Hayato Kuroda (Fujitsu)"
Дата:
Dear Shubham,

Thanks for updating the patch! I checked your patches briefly. Here are my comments.

01. API

Since the option for test_decoding is enabled by default, I think it should be renamed.
E.g., "skip-generated-columns" or something.

02. ddl.sql

```
+-- check include-generated-columns option with generated column
+CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a * 2) STORED);
+INSERT INTO gencoltable (a) VALUES (1), (2), (3);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts',
'1','include-generated-columns', '1');
 
+                            data                             
+-------------------------------------------------------------
+ BEGIN
+ table public.gencoltable: INSERT: a[integer]:1 b[integer]:2
+ table public.gencoltable: INSERT: a[integer]:2 b[integer]:4
+ table public.gencoltable: INSERT: a[integer]:3 b[integer]:6
+ COMMIT
+(5 rows)
```

We should test non-default case, which the generated columns are not generated.

03. ddl.sql

Not sure new tests are in the correct place. Do we have to add new file and move tests to it?
Thought?

04. protocol.sgml

Please keep the format of the sgml file.

05. protocol.sgml

The option is implemented as the streaming option of pgoutput plugin, so they should be
located under "Logical Streaming Replication Parameters" section.

05. AlterSubscription

```
+                if (IsSet(opts.specified_opts, SUBOPT_GENERATED_COLUMN))
+                {
+                    ereport(ERROR,
+                            (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+                             errmsg("toggling generated_column option is not allowed.")));
+                }
```

If you don't want to support the option, you can remove SUBOPT_GENERATED_COLUMN
macro from the function. But can you clarify the reason why you do not want?

07. logicalrep_write_tuple

```
-        if (!column_in_column_list(att->attnum, columns))
+        if (!column_in_column_list(att->attnum, columns) && !att->attgenerated)
+            continue;
+
+        if (att->attgenerated && !publish_generated_column)
             continue;
```

I think changes in v2 was reverted or wrongly merged.

08. test code

Can you add tests that generated columns are replicated by the logical replication?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 


Re: Pgoutput not capturing the generated columns

От
vignesh C
Дата:
On Thu, 23 May 2024 at 09:19, Shubham Khanna
<khannashubham1197@gmail.com> wrote:
>
> > Dear Shubham,
> >
> > Thanks for creating a patch! Here are high-level comments.
>
> > 1.
> > Please document the feature. If it is hard to describe, we should change the API.
>
> I have added the feature in the document.
>
> > 4.
> > Regarding the test_decoding plugin, it has already been able to decode the
> > generated columns. So... as the first place, is the proposed option really needed
> > for the plugin? Why do you include it?
> > If you anyway want to add the option, the default value should be on - which keeps
> > current behavior.
>
> I have made the generated column options as true for test_decoding
> plugin so by default we will send generated column data.
>
> > 5.
> > Assuming that the feature become usable used for logical replicaiton. Not sure,
> > should we change the protocol version at that time? Nodes prior than PG17 may
> > not want receive values for generated columns. Can we control only by the option?
>
> I verified the backward compatibility test by using the generated
> column option and it worked fine. I think there is no need to make any
> further changes.
>
> > 7.
> >
> > Some functions refer data->publish_generated_column many times. Can we store
> > the value to a variable?
> >
> > Below comments are for test_decoding part, but they may be not needed.
> >
> > =====
> >
> > a. pg_decode_startup()
> >
> > ```
> > +        else if (strcmp(elem->defname, "include_generated_columns") == 0)
> > ```
> >
> > Other options for test_decoding do not have underscore. It should be
> > "include-generated-columns".
> >
> > b. pg_decode_change()
> >
> > data->include_generated_columns is referred four times in the function.
> > Can you store the value to a varibable?
> >
> >
> > c. pg_decode_change()
> >
> > ```
> > -                                    true);
> > +                                    true, data->include_generated_columns );
> > ```
> >
> > Please remove the blank.
>
> Fixed.
> The attached v3 Patch has the changes for the same.

Few comments:
1) Since this is removed, tupdesc variable is not required anymore:
+++ b/src/backend/catalog/pg_publication.c
@@ -534,12 +534,6 @@ publication_translate_columns(Relation targetrel,
List *columns,
                                        errmsg("cannot use system
column \"%s\" in publication column list",
                                                   colname));

-               if (TupleDescAttr(tupdesc, attnum - 1)->attgenerated)
-                       ereport(ERROR,
-
errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
-                                       errmsg("cannot use generated
column \"%s\" in publication column list",
-                                                  colname));

2) In test_decoding include_generated_columns option is used:
+               else if (strcmp(elem->defname,
"include_generated_columns") == 0)
+               {
+                       if (elem->arg == NULL)
+                               continue;
+                       else if (!parse_bool(strVal(elem->arg),
&data->include_generated_columns))
+                               ereport(ERROR,
+
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                                                errmsg("could not
parse value \"%s\" for parameter \"%s\"",
+
strVal(elem->arg), elem->defname)));
+               }

In subscription we have used generated_column, we can try to use the
same option in both places:
+               else if (IsSet(supported_opts, SUBOPT_GENERATED_COLUMN) &&
+                                strcmp(defel->defname,
"generated_column") == 0)
+               {
+                       if (IsSet(opts->specified_opts,
SUBOPT_GENERATED_COLUMN))
+                               errorConflictingDefElem(defel, pstate);
+
+                       opts->specified_opts |= SUBOPT_GENERATED_COLUMN;
+                       opts->generated_column = defGetBoolean(defel);
+               }

3) Tab completion can be added for create subscription to include
generated_column option

4) There are few whitespace issues while applying the patch, check for
git diff --check

5) Add few tests for the new option added

Regards,
Vignesh



Re: Pgoutput not capturing the generated columns

От
Peter Smith
Дата:
Hi,

Here are some review comments for the patch v3-0001.

I don't think v3 addressed any of my previous review comments for
patches v1 and v2. [1][2]

So the comments below are limited only to the new code (i.e. the v3
versus v2 differences). Meanwhile, all my previous review comments may
still apply.

======
GENERAL

The patch applied gives whitespace warnings:

[postgres@CentOS7-x64 oss_postgres_misc]$ git apply
../patches_misc/v3-0001-Support-generated-column-capturing-generated-colu.patch
../patches_misc/v3-0001-Support-generated-column-capturing-generated-colu.patch:150:
trailing whitespace.

../patches_misc/v3-0001-Support-generated-column-capturing-generated-colu.patch:202:
trailing whitespace.

../patches_misc/v3-0001-Support-generated-column-capturing-generated-colu.patch:730:
trailing whitespace.
warning: 3 lines add whitespace errors.

======
contrib/test_decoding/test_decoding.c

1. pg_decode_change

  MemoryContext old;
+ bool include_generated_columns;
+

I'm not really convinced this variable saves any code.

======
doc/src/sgml/protocol.sgml

2.
+        <varlistentry>
+         <term><replaceable
class="parameter">include-generated-columns</replaceable></term>
+         <listitem>
+        <para>
+        The include-generated-columns option controls whether
generated columns should be included in the string representation of
tuples during logical decoding in PostgreSQL. This allows users to
customize the output format based on whether they want to include
these columns or not.
+         </para>
+         </listitem>
+         </varlistentry>

2a.
Something is not correct when this name has hyphens and all the nearby
parameter names do not. Shouldn't it be all uppercase like the other
boolean parameter?

~

2b.
Text in the SGML file should be wrapped properly.

~

2c.
IMO the comment can be more terse and it also needs to specify that it
is a boolean type, and what is the default value if not passed.

SUGGESTION

INCLUDE_GENERATED_COLUMNS [ boolean ]

If true, then generated columns should be included in the string
representation of tuples during logical decoding in PostgreSQL. The
default is false.

======
src/backend/replication/logical/proto.c

3. logicalrep_write_tuple

- if (!column_in_column_list(att->attnum, columns))
+ if (!column_in_column_list(att->attnum, columns) && !att->attgenerated)
+ continue;
+
+ if (att->attgenerated && !publish_generated_column)
  continue;

3a.
This code seems overcomplicated checking the same flag multiple times.

SUGGESTION
if (att->attgenerated)
{
  if (!publish_generated_column)
    continue;
}
else
{
  if (!column_in_column_list(att->attnum, columns))
    continue;
}

~

3b.
The same logic occurs several times in logicalrep_write_tuple

~~~

4. logicalrep_write_attrs

  if (!column_in_column_list(att->attnum, columns))
  continue;

+ if (att->attgenerated && !publish_generated_column)
+ continue;
+

Shouldn't these code fragments (2x in this function) look the same as
in logicalrep_write_tuple? See the above review comments.

======
src/backend/replication/pgoutput/pgoutput.c

5. maybe_send_schema

  TransactionId topxid = InvalidTransactionId;
+ bool publish_generated_column = data->publish_generated_column;

I'm not convinced this saves any code, and anyway, it is not
consistent with other fields in this function that are not extracted
to another variable (e.g. data->streaming).

~~~

6. pgoutput_change
-
+ bool publish_generated_column = data->publish_generated_column;
+

I'm not convinced this saves any code, and anyway, it is not
consistent with other fields in this function that are not extracted
to another variable (e.g. data->binary).

======
[1] My v1 review -
https://www.postgresql.org/message-id/CAHut+PsuJfcaeg6zst=6PE5uyJv_UxVRHU3ck7W2aHb1uQYKng@mail.gmail.com
[2] My v2 review -
https://www.postgresql.org/message-id/CAHut%2BPv4RpOsUgkEaXDX%3DW2rhHAsJLiMWdUrUGZOcoRHuWj5%2BQ%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia