Обсуждение: Improvements and additions to COPY progress reporting
Hi, With [0] we got COPY progress reporting. Before the column names of this newly added view are effectively set in stone with the release of pg14, I propose the following set of relatively small patches. These are v2, because it is a patchset that is based on a set of patches that I previously posted in [0]. 0001 Adds a column to pg_stat_progress_copy which details the amount of tuples that were excluded from insertion by the WHERE clause of the COPY FROM command. 0002 alters pg_stat_progress_copy to use 'tuple'-terminology instead of 'line'-terminology. 'Line' doesn't make sense in the binary copy case, and only for the 'text' copy format there can be a guarantee that the source / output file actually contains the reported amount of lines, whereas the amount of data tuples (which is also what it's called internally) is guaranteed to equal for all data types. There was some discussion about this in [0] where the author thought 'line' is more consistent with the CSV documentation, and where I argued that 'tuple' is both more consistent with the rest of the progress reporting tables and more consistent with the actual counted items: these are the tuples serialized / inserted (as noted in the CSV docs; "Thus the files are not strictly one line per table row like text-format files."). Patch 0003 adds backlinks to the progress reporting docs from the docs of the commands that have progress reporting (re/index, cluster, vacuum, etc.) such that progress reporting is better discoverable from the relevant commands, and removes the datname column from the progress_copy view (that column was never committed). This too should be fairly trivial and uncontroversial. 0004 adds the 'command' column to the progress_copy view; which distinguishes between COPY FROM and COPY TO. The two commands are (in my opinion) significantly different enough to warrant this column; similar to the difference between CREATE INDEX/REINDEX [CONCURRENTLY] which also report that information. I believe that this change is appropriate; as the semantics of the columns change depending on the command being executed. Lastly, 0005 adds 'io_target' to the reported information, that is, FILE, PROGRAM, STDIO or CALLBACK. Although this can relatively easily be determined based on the commands in pg_stat_activity, it is reasonably something that a user would want to query on, as the origin/target of COPY has security and performance implications, whereas other options (e.g. format) are less interesting for clients that are not executing that specific COPY command. Of special interest in 0005 is that it reports the io_target for the logical replications' initial tablesyncs' internal COPY. This would otherwise be measured, but no knowledge about the type of copy (or its origin) would be available on the worker's side. I'm not married to this patch 0005, but I believe it could be useful, and therefore included it in the patchset. With regards, Matthias van de Meent. [0] https://www.postgresql.org/message-id/flat/CAFp7Qwr6_FmRM6pCO0x_a0mymOfX_Gg%2BFEKet4XaTGSW%3DLitKQ%40mail.gmail.com
Вложения
- v2-0005-Add-a-io_target-column-to-the-copy-progress-view.patch
- v2-0002-Rename-lines-to-tuples-in-COPY-progress-reporting.patch
- v2-0001-Add-progress-reporting-for-excluded-rows.patch
- v2-0003-Add-backlinks-to-progress-reporting-documentation.patch
- v2-0004-Add-a-command-column-to-the-copy-progress-view.patch
po 8. 2. 2021 v 19:35 odesílatel Matthias van de Meent <boekewurm+postgres@gmail.com> napsal: > > Hi, > > With [0] we got COPY progress reporting. Before the column names of > this newly added view are effectively set in stone with the release of > pg14, I propose the following set of relatively small patches. These > are v2, because it is a patchset that is based on a set of patches > that I previously posted in [0]. Hello. I had this in my backlog to revisit this feature as well before the release. Thanks for picking this up. > 0001 Adds a column to pg_stat_progress_copy which details the amount > of tuples that were excluded from insertion by the WHERE clause of the > COPY FROM command. > > 0002 alters pg_stat_progress_copy to use 'tuple'-terminology instead > of 'line'-terminology. 'Line' doesn't make sense in the binary copy > case, and only for the 'text' copy format there can be a guarantee > that the source / output file actually contains the reported amount of > lines, whereas the amount of data tuples (which is also what it's > called internally) is guaranteed to equal for all data types. > > There was some discussion about this in [0] where the author thought > 'line' is more consistent with the CSV documentation, and where I > argued that 'tuple' is both more consistent with the rest of the > progress reporting tables and more consistent with the actual counted > items: these are the tuples serialized / inserted (as noted in the CSV > docs; "Thus the files are not strictly one line per table row like > text-format files."). As an mentioned author I have no preference over line or tuple terminology here. For some cases "line" terminology fits better, for some "tuple" one. Docs can be improved later if needed to make it clear at some cases (for example in most common case probably - CSV import/export) tuple equals one line in CSV. > Patch 0003 adds backlinks to the progress reporting docs from the docs > of the commands that have progress reporting (re/index, cluster, > vacuum, etc.) such that progress reporting is better discoverable from > the relevant commands, and removes the datname column from the > progress_copy view (that column was never committed). This too should > be fairly trivial and uncontroversial. > > 0004 adds the 'command' column to the progress_copy view; which > distinguishes between COPY FROM and COPY TO. The two commands are (in > my opinion) significantly different enough to warrant this column; > similar to the difference between CREATE INDEX/REINDEX [CONCURRENTLY] > which also report that information. I believe that this change is > appropriate; as the semantics of the columns change depending on the > command being executed. This was part of my initial patch as well, but I decided to strip it out to make the final patch as small as possible to make it quickly mergeable without need of further discussion. From my side this is useful to have directly in the progress report as well. > Lastly, 0005 adds 'io_target' to the reported information, that is, > FILE, PROGRAM, STDIO or CALLBACK. Although this can relatively easily > be determined based on the commands in pg_stat_activity, it is > reasonably something that a user would want to query on, as the > origin/target of COPY has security and performance implications, > whereas other options (e.g. format) are less interesting for clients > that are not executing that specific COPY command. Similar (simplified, not supporting CALLBACK) info was also part of the initial patch and stripped out later. I'm also +1 on this info being useful to have directly in the progress report. > Of special interest in 0005 is that it reports the io_target for the > logical replications' initial tablesyncs' internal COPY. This would > otherwise be measured, but no knowledge about the type of copy (or its > origin) would be available on the worker's side. I'm not married to > this patch 0005, but I believe it could be useful, and therefore > included it in the patchset. All patches seem good to me. I was able to apply them to current clean master and "make check" has succeeded without problems. > > With regards, > > Matthias van de Meent. > > > [0] https://www.postgresql.org/message-id/flat/CAFp7Qwr6_FmRM6pCO0x_a0mymOfX_Gg%2BFEKet4XaTGSW%3DLitKQ%40mail.gmail.com
On Tue, Feb 9, 2021 at 12:06 AM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote: > With [0] we got COPY progress reporting. Before the column names of > this newly added view are effectively set in stone with the release of > pg14, I propose the following set of relatively small patches. These > are v2, because it is a patchset that is based on a set of patches > that I previously posted in [0]. Thanks for working on the patches. Here are some comments: 0001 - +1 to add tuples_excluded and the patch LGTM. 0002 - Yes, the tuples_processed or tuples_excluded makes more sense to me than lines_processed and lines_excluded. The patch LGTM. 0003 - Instead of just adding the progress reporting to "See also" sections in the footer of the respective pages analyze, cluster and others, it would be nice if we have a mention of it in the description as pg_basebackup has something like below: <para> Whenever <application>pg_basebackup</application> is taking a base backup, the server's <structname>pg_stat_progress_basebackup</structname> view will report the progress of the backup. See <xref linkend="basebackup-progress-reporting"/> for details. 0004 - 1) How about PROGRESS_COPY_COMMAND_TYPE instead of PROGRESS_COPY_COMMAND? The names looks bit confusing with the existing PROGRESS_COMMAND_COPY. 0005 - 1) How about + or <literal>CALLBACK</literal> (used in the table synchronization background + worker). instead of + or <literal>CALLBACK</literal> (used in the tablesync background + worker). Because "table synchronization" is being used in logical-replication.sgml. 2) I think cstate->copy_src = COPY_CALLBACK is assigned after the switch case added in copyfrom.c if (data_source_cb) { cstate->copy_src = COPY_CALLBACK; cstate->data_source_cb = data_source_cb; } Also, you can add this to the current commitfest. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
po 8. 2. 2021 v 19:35 odesílatel Matthias van de Meent <boekewurm+postgres@gmail.com> napsal: > > Hi, > > With [0] we got COPY progress reporting. Before the column names of > this newly added view are effectively set in stone with the release of > pg14, I propose the following set of relatively small patches. These > are v2, because it is a patchset that is based on a set of patches > that I previously posted in [0]. > > 0001 Adds a column to pg_stat_progress_copy which details the amount > of tuples that were excluded from insertion by the WHERE clause of the > COPY FROM command. > > 0002 alters pg_stat_progress_copy to use 'tuple'-terminology instead > of 'line'-terminology. 'Line' doesn't make sense in the binary copy > case, and only for the 'text' copy format there can be a guarantee > that the source / output file actually contains the reported amount of > lines, whereas the amount of data tuples (which is also what it's > called internally) is guaranteed to equal for all data types. > > There was some discussion about this in [0] where the author thought > 'line' is more consistent with the CSV documentation, and where I > argued that 'tuple' is both more consistent with the rest of the > progress reporting tables and more consistent with the actual counted > items: these are the tuples serialized / inserted (as noted in the CSV > docs; "Thus the files are not strictly one line per table row like > text-format files."). > > Patch 0003 adds backlinks to the progress reporting docs from the docs > of the commands that have progress reporting (re/index, cluster, > vacuum, etc.) such that progress reporting is better discoverable from > the relevant commands, and removes the datname column from the > progress_copy view (that column was never committed). This too should > be fairly trivial and uncontroversial. > > 0004 adds the 'command' column to the progress_copy view; which > distinguishes between COPY FROM and COPY TO. The two commands are (in > my opinion) significantly different enough to warrant this column; > similar to the difference between CREATE INDEX/REINDEX [CONCURRENTLY] > which also report that information. I believe that this change is > appropriate; as the semantics of the columns change depending on the > command being executed. > > Lastly, 0005 adds 'io_target' to the reported information, that is, > FILE, PROGRAM, STDIO or CALLBACK. Although this can relatively easily > be determined based on the commands in pg_stat_activity, it is > reasonably something that a user would want to query on, as the > origin/target of COPY has security and performance implications, > whereas other options (e.g. format) are less interesting for clients > that are not executing that specific COPY command. I took a little deeper look and I'm not sure if I understand FILE and STDIO. I have finally tried to finalize some initial regress testing of COPY command progress using triggers. I have attached the initial patch applicable to your changes. As you can see COPY FROM STDIN is reported as FILE. That's probably expected, but it is a little confusing for me since STDIN and STDIO sound similar. What is the purpose of STDIO? When is the COPY command reported with io_target of STDIO? > Of special interest in 0005 is that it reports the io_target for the > logical replications' initial tablesyncs' internal COPY. This would > otherwise be measured, but no knowledge about the type of copy (or its > origin) would be available on the worker's side. I'm not married to > this patch 0005, but I believe it could be useful, and therefore > included it in the patchset. > > > With regards, > > Matthias van de Meent. > > > [0] https://www.postgresql.org/message-id/flat/CAFp7Qwr6_FmRM6pCO0x_a0mymOfX_Gg%2BFEKet4XaTGSW%3DLitKQ%40mail.gmail.com
Вложения
On Tue, 9 Feb 2021 at 09:32, Josef Šimánek <josef.simanek@gmail.com> wrote: > > po 8. 2. 2021 v 19:35 odesílatel Matthias van de Meent > <boekewurm+postgres@gmail.com> napsal: > > Lastly, 0005 adds 'io_target' to the reported information, that is, > > FILE, PROGRAM, STDIO or CALLBACK. Although this can relatively easily > > be determined based on the commands in pg_stat_activity, it is > > reasonably something that a user would want to query on, as the > > origin/target of COPY has security and performance implications, > > whereas other options (e.g. format) are less interesting for clients > > that are not executing that specific COPY command. > > I took a little deeper look and I'm not sure if I understand FILE and > STDIO. I have finally tried to finalize some initial regress testing > of COPY command progress using triggers. I have attached the initial > patch applicable to your changes. As you can see COPY FROM STDIN is > reported as FILE. That's probably expected, but it is a little > confusing for me since STDIN and STDIO sound similar. What is the > purpose of STDIO? When is the COPY command reported with io_target of > STDIO? I checked for the type of the copy_src before it was correctly set, therefore only reporting FILE type, but this will be fixed shortly in v3. Matthias
út 9. 2. 2021 v 12:51 odesílatel 0010203112132233 <boekewurm@gmail.com> napsal: > > On Tue, 9 Feb 2021 at 09:32, Josef Šimánek <josef.simanek@gmail.com> wrote: > > > > po 8. 2. 2021 v 19:35 odesílatel Matthias van de Meent > > <boekewurm+postgres@gmail.com> napsal: > > > Lastly, 0005 adds 'io_target' to the reported information, that is, > > > FILE, PROGRAM, STDIO or CALLBACK. Although this can relatively easily > > > be determined based on the commands in pg_stat_activity, it is > > > reasonably something that a user would want to query on, as the > > > origin/target of COPY has security and performance implications, > > > whereas other options (e.g. format) are less interesting for clients > > > that are not executing that specific COPY command. > > > > I took a little deeper look and I'm not sure if I understand FILE and > > STDIO. I have finally tried to finalize some initial regress testing > > of COPY command progress using triggers. I have attached the initial > > patch applicable to your changes. As you can see COPY FROM STDIN is > > reported as FILE. That's probably expected, but it is a little > > confusing for me since STDIN and STDIO sound similar. What is the > > purpose of STDIO? When is the COPY command reported with io_target of > > STDIO? > > I checked for the type of the copy_src before it was correctly set, > therefore only reporting FILE type, but this will be fixed shortly in > v3. OK, would you mind to integrate my regression test initial patch as well in v3 or should I submit it later in a separate way? > Matthias
On Tue, 9 Feb 2021 at 08:12, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Tue, Feb 9, 2021 at 12:06 AM Matthias van de Meent > <boekewurm+postgres@gmail.com> wrote: > > With [0] we got COPY progress reporting. Before the column names of > > this newly added view are effectively set in stone with the release of > > pg14, I propose the following set of relatively small patches. These > > are v2, because it is a patchset that is based on a set of patches > > that I previously posted in [0]. > > Thanks for working on the patches. Here are some comments: > > 0001 - +1 to add tuples_excluded and the patch LGTM. > > 0002 - Yes, the tuples_processed or tuples_excluded makes more sense > to me than lines_processed and lines_excluded. The patch LGTM. > > 0003 - Instead of just adding the progress reporting to "See also" > sections in the footer of the respective pages analyze, cluster and > others, it would be nice if we have a mention of it in the description > as pg_basebackup has something like below: > <para> > Whenever <application>pg_basebackup</application> is taking a base > backup, the server's <structname>pg_stat_progress_basebackup</structname> > view will report the progress of the backup. > See <xref linkend="basebackup-progress-reporting"/> for details. Added > 0004 - > 1) How about PROGRESS_COPY_COMMAND_TYPE instead of > PROGRESS_COPY_COMMAND? The names looks bit confusing with the existing > PROGRESS_COMMAND_COPY. The current name is consistent with the naming of the other command-reporting progress views; CREATEIDX and CLUSTER both use the *_COMMAND as this column indexes' internal name. > 0005 - > 1) How about > + or <literal>CALLBACK</literal> (used in the table > synchronization background > + worker). > instead of > + or <literal>CALLBACK</literal> (used in the tablesync background > + worker). > Because "table synchronization" is being used in logical-replication.sgml. Fixed > 2) I think cstate->copy_src = COPY_CALLBACK is assigned after the > switch case added in copyfrom.c > if (data_source_cb) > { > cstate->copy_src = COPY_CALLBACK; > cstate->data_source_cb = data_source_cb; > } Yes, I noticed this too while working on the patchset, but apparently didn't act on this... Fixed in attachted version. > Also, you can add this to the current commitfest. See https://commitfest.postgresql.org/32/2977/ On Tue, 9 Feb 2021 at 12:53, Josef Šimánek <josef.simanek@gmail.com> wrote: > > OK, would you mind to integrate my regression test initial patch as > well in v3 or should I submit it later in a separate way? Attached, with minor fixes With regards, Matthias van de Meent
Вложения
- v3-0006-Add-progress-reporting-regression-tests.patch
- v3-0004-Add-a-command-column-to-the-copy-progress-view.patch
- v3-0003-Add-backlinks-to-progress-reporting-documentation.patch
- v3-0002-Rename-lines-to-tuples-in-COPY-progress-reporting.patch
- v3-0005-Add-a-io_target-column-to-the-copy-progress-view.patch
- v3-0001-Add-progress-reporting-for-excluded-rows.patch
On Tue, Feb 9, 2021 at 6:02 PM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote: > > Also, you can add this to the current commitfest. > > See https://commitfest.postgresql.org/32/2977/ > > On Tue, 9 Feb 2021 at 12:53, Josef Šimánek <josef.simanek@gmail.com> wrote: > > > > OK, would you mind to integrate my regression test initial patch as > > well in v3 or should I submit it later in a separate way? > > Attached, with minor fixes Why do we need to have a new test file progress.sql for the test cases? Can't we add them into existing copy.sql or copy2.sql? Or do you have a plan to add test cases into progress.sql for other progress reporting commands? IMO, it's better not add any new test file but add the tests to existing files. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
On Wed, 10 Feb 2021 at 07:43, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Tue, Feb 9, 2021 at 6:02 PM Matthias van de Meent > <boekewurm+postgres@gmail.com> wrote: > > > Also, you can add this to the current commitfest. > > > > See https://commitfest.postgresql.org/32/2977/ > > > > On Tue, 9 Feb 2021 at 12:53, Josef Šimánek <josef.simanek@gmail.com> wrote: > > > > > > OK, would you mind to integrate my regression test initial patch as > > > well in v3 or should I submit it later in a separate way? > > > > Attached, with minor fixes > > Why do we need to have a new test file progress.sql for the test > cases? Can't we add them into existing copy.sql or copy2.sql? Or do > you have a plan to add test cases into progress.sql for other progress > reporting commands? I don't mind moving the test into copy or copy2, but the main reason to put it in a seperate file is to test the 'copy' component of the feature called 'progress reporting'. If the feature instead is 'copy' and 'progress reporting' is part of that feature, then I'd put it in the copy-tests, but because the documentation of this has it's own docs page 'progress reporting', and because 'copy' is a subsection of that, I do think that this feature warrants its own regression test file. There are no other tests for the progress reporting feature yet, because COPY ... FROM is the only command that is progress reported _and_ that can fire triggers while running the command, so checking the progress view during the progress reported command is only feasable in COPY progress reporting. To test the other progress reporting views, we would need multiple sessions, which I believe is impossible in this test format. Please correct me if I'm wrong; I'd love to add tests for the other components. That will not be in this patchset, though. > IMO, it's better not add any new test file but add the tests to existing files. In general I agree, but in some cases (e.g. new system component, new full-fledged feature), new test files are needed. I think that this could be one of those cases. With regards, Matthias van de Meent
čt 11. 2. 2021 v 15:27 odesílatel Matthias van de Meent <boekewurm+postgres@gmail.com> napsal: > > On Wed, 10 Feb 2021 at 07:43, Bharath Rupireddy > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > > On Tue, Feb 9, 2021 at 6:02 PM Matthias van de Meent > > <boekewurm+postgres@gmail.com> wrote: > > > > Also, you can add this to the current commitfest. > > > > > > See https://commitfest.postgresql.org/32/2977/ > > > > > > On Tue, 9 Feb 2021 at 12:53, Josef Šimánek <josef.simanek@gmail.com> wrote: > > > > > > > > OK, would you mind to integrate my regression test initial patch as > > > > well in v3 or should I submit it later in a separate way? > > > > > > Attached, with minor fixes > > > > Why do we need to have a new test file progress.sql for the test > > cases? Can't we add them into existing copy.sql or copy2.sql? Or do > > you have a plan to add test cases into progress.sql for other progress > > reporting commands? > > I don't mind moving the test into copy or copy2, but the main reason > to put it in a seperate file is to test the 'copy' component of the > feature called 'progress reporting'. If the feature instead is 'copy' > and 'progress reporting' is part of that feature, then I'd put it in > the copy-tests, but because the documentation of this has it's own > docs page 'progress reporting', and because 'copy' is a subsection of > that, I do think that this feature warrants its own regression test > file. > > There are no other tests for the progress reporting feature yet, > because COPY ... FROM is the only command that is progress reported > _and_ that can fire triggers while running the command, so checking > the progress view during the progress reported command is only > feasable in COPY progress reporting. To test the other progress > reporting views, we would need multiple sessions, which I believe is > impossible in this test format. Please correct me if I'm wrong; I'd > love to add tests for the other components. That will not be in this > patchset, though. > > > IMO, it's better not add any new test file but add the tests to existing files. > > In general I agree, but in some cases (e.g. new system component, new > full-fledged feature), new test files are needed. I think that this > could be one of those cases. I have split it since it should be the start of progress reporting testing at all. If you better consider this as part of COPY testing, feel free to move it to already existing copy testing related files. There's no real reason to keep it separated if not needed. > > With regards, > > Matthias van de Meent
On Thu, Feb 11, 2021, 8:08 PM Josef Šimánek <josef.simanek@gmail.com> wrote:
čt 11. 2. 2021 v 15:27 odesílatel Matthias van de Meent
<boekewurm+postgres@gmail.com> napsal:
>
> On Wed, 10 Feb 2021 at 07:43, Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > On Tue, Feb 9, 2021 at 6:02 PM Matthias van de Meent
> > <boekewurm+postgres@gmail.com> wrote:
> > > > Also, you can add this to the current commitfest.
> > >
> > > See https://commitfest.postgresql.org/32/2977/
> > >
> > > On Tue, 9 Feb 2021 at 12:53, Josef Šimánek <josef.simanek@gmail.com> wrote:
> > > >
> > > > OK, would you mind to integrate my regression test initial patch as
> > > > well in v3 or should I submit it later in a separate way?
> > >
> > > Attached, with minor fixes
> >
> > Why do we need to have a new test file progress.sql for the test
> > cases? Can't we add them into existing copy.sql or copy2.sql? Or do
> > you have a plan to add test cases into progress.sql for other progress
> > reporting commands?
>
> I don't mind moving the test into copy or copy2, but the main reason
> to put it in a seperate file is to test the 'copy' component of the
> feature called 'progress reporting'. If the feature instead is 'copy'
> and 'progress reporting' is part of that feature, then I'd put it in
> the copy-tests, but because the documentation of this has it's own
> docs page 'progress reporting', and because 'copy' is a subsection of
> that, I do think that this feature warrants its own regression test
> file.
>
> There are no other tests for the progress reporting feature yet,
> because COPY ... FROM is the only command that is progress reported
> _and_ that can fire triggers while running the command, so checking
> the progress view during the progress reported command is only
> feasable in COPY progress reporting. To test the other progress
> reporting views, we would need multiple sessions, which I believe is
> impossible in this test format. Please correct me if I'm wrong; I'd
> love to add tests for the other components. That will not be in this
> patchset, though.
>
> > IMO, it's better not add any new test file but add the tests to existing files.
>
> In general I agree, but in some cases (e.g. new system component, new
> full-fledged feature), new test files are needed. I think that this
> could be one of those cases.
I have split it since it should be the start of progress reporting
testing at all. If you better consider this as part of COPY testing,
feel free to move it to already existing copy testing related files.
There's no real reason to keep it separated if not needed.
+1 to move those test cases to existing copy test files.
On Thu, 11 Feb 2021 at 15:44, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > > On Thu, Feb 11, 2021, 8:08 PM Josef Šimánek <josef.simanek@gmail.com> wrote: >> I have split it since it should be the start of progress reporting >> testing at all. If you better consider this as part of COPY testing, >> feel free to move it to already existing copy testing related files. >> There's no real reason to keep it separated if not needed. > > > +1 to move those test cases to existing copy test files. Thanks for your reviews. PFA v4 of the patchset, in which the tests are put into copy.sql (well, input/copy.source). This also adds tests for correctly reporting COPY ... FROM 'file'. I've changed the notice-alerted format from manually naming each column to calling to_jsonb and removing the unstable columns from the reported value; this should therefore be stable and give direct notice to changes in the view. With regards, Matthias van de Meent.
Вложения
- v4-0004-Add-a-command-column-to-the-copy-progress-view.patch
- v4-0006-Add-copy-progress-reporting-regression-tests.patch
- v4-0002-Rename-lines-to-tuples-in-COPY-progress-reporting.patch
- v4-0003-Add-backlinks-to-progress-reporting-documentation.patch
- v4-0005-Add-a-io_target-column-to-the-copy-progress-view.patch
- v4-0001-Add-progress-reporting-for-excluded-rows.patch
On Fri, 12 Feb 2021 at 12:23, Matthias van de Meent <boekewurm+postgres@gmail.com> wrote: > > On Thu, 11 Feb 2021 at 15:44, Bharath Rupireddy > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > > > > On Thu, Feb 11, 2021, 8:08 PM Josef Šimánek <josef.simanek@gmail.com> wrote: > >> I have split it since it should be the start of progress reporting > >> testing at all. If you better consider this as part of COPY testing, > >> feel free to move it to already existing copy testing related files. > >> There's no real reason to keep it separated if not needed. > > > > > > +1 to move those test cases to existing copy test files. > > Thanks for your reviews. PFA v4 of the patchset, in which the tests > are put into copy.sql (well, input/copy.source). This also adds tests > for correctly reporting COPY ... FROM 'file'. PFA v5, which fixes a failure in the pg_upgrade regression tests due to incorrect usage of @abs_builddir@. I had the changes staged, but forgot to add them to the patches. Sorry for the noise. -Matthias
Вложения
- v5-0006-Add-copy-progress-reporting-regression-tests.patch
- v5-0005-Add-copy-progress-reporting-regression-tests.patch
- v5-0002-Add-backlinks-to-progress-reporting-documentation.patch
- v5-0004-Add-a-io_target-column-to-the-copy-progress-view.patch
- v5-0003-Add-a-command-column-to-the-copy-progress-view.patch
- v5-0001-Rename-lines-to-tuples-in-COPY-progress-reporting.patch
On Fri, Feb 12, 2021 at 5:40 PM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote: > > On Fri, 12 Feb 2021 at 12:23, Matthias van de Meent > <boekewurm+postgres@gmail.com> wrote: > > > > On Thu, 11 Feb 2021 at 15:44, Bharath Rupireddy > > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > > > > > > > On Thu, Feb 11, 2021, 8:08 PM Josef Šimánek <josef.simanek@gmail.com> wrote: > > >> I have split it since it should be the start of progress reporting > > >> testing at all. If you better consider this as part of COPY testing, > > >> feel free to move it to already existing copy testing related files. > > >> There's no real reason to keep it separated if not needed. > > > > > > > > > +1 to move those test cases to existing copy test files. > > > > Thanks for your reviews. PFA v4 of the patchset, in which the tests > > are put into copy.sql (well, input/copy.source). This also adds tests > > for correctly reporting COPY ... FROM 'file'. > > PFA v5, which fixes a failure in the pg_upgrade regression tests due > to incorrect usage of @abs_builddir@. I had the changes staged, but > forgot to add them to the patches. > > Sorry for the noise. Looks like the patch 0001 that was adding tuples_excluded was missing and cfbot is also not happy with the v5 patch set. Maybe, we may not need 6 patches as they are relatively very small patches. IMO, the following are enough: 0001 - tuples_excluded, lines to tuples change, COPY FROM/COPY TO addition, io_target -- basically all the code related patches can go into 0001 0002 - documentation 0003 - tests - I think we can only have a simple test(in copy2.sql) showing stdin/stdout and not have file related tests. Because these patches work as expected, please find my testing below: postgres=# select * from pg_stat_progress_copy; pid | datid | datname | relid | command | io_target | bytes_processed | bytes_total | tuples_processed | tuples_excluded ---------+-------+----------+-------+-----------+-----------+-----------------+-------------+------------------+----------------- 2886103 | 12977 | postgres | 16384 | COPY FROM | FILE | 83099648 | 85777795 | 9553999 | 1111111 (1 row) postgres=# select * from pg_stat_progress_copy; pid | datid | datname | relid | command | io_target | bytes_processed | bytes_total | tuples_processed | tuples_excluded ---------+-------+----------+-------+-----------+-----------+-----------------+-------------+------------------+----------------- 2886103 | 12977 | postgres | 16384 | COPY FROM | STDIO | 0 | 0 | 0 | 0 (1 row) postgres=# select * from pg_stat_progress_copy; pid | datid | datname | relid | command | io_target | bytes_processed | bytes_total | tuples_processed | tuples_excluded ---------+-------+----------+-------+---------+-----------+-----------------+-------------+------------------+----------------- 2886103 | 12977 | postgres | 16384 | COPY TO | FILE | 37771610 | 0 | 4999228 | 0 (1 row) postgres=# select * from pg_stat_progress_copy; pid | datid | datname | relid | command | io_target | bytes_processed | bytes_total | tuples_processed | tuples_excluded ---------+-------+----------+-------+-----------+-----------+-----------------+-------------+------------------+----------------- 2892816 | 12977 | postgres | 16384 | COPY FROM | CALLBACK | 249777823 | 0 | 31888892 | 0 (1 row) With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
On Fri, 12 Feb 2021 at 13:40, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Fri, Feb 12, 2021 at 5:40 PM Matthias van de Meent > <boekewurm+postgres@gmail.com> wrote: > > > > On Fri, 12 Feb 2021 at 12:23, Matthias van de Meent > > <boekewurm+postgres@gmail.com> wrote: > > > > > > On Thu, 11 Feb 2021 at 15:44, Bharath Rupireddy > > > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > > > > > > > > > > On Thu, Feb 11, 2021, 8:08 PM Josef Šimánek <josef.simanek@gmail.com> wrote: > > > >> I have split it since it should be the start of progress reporting > > > >> testing at all. If you better consider this as part of COPY testing, > > > >> feel free to move it to already existing copy testing related files. > > > >> There's no real reason to keep it separated if not needed. > > > > > > > > > > > > +1 to move those test cases to existing copy test files. > > > > > > Thanks for your reviews. PFA v4 of the patchset, in which the tests > > > are put into copy.sql (well, input/copy.source). This also adds tests > > > for correctly reporting COPY ... FROM 'file'. > > > > PFA v5, which fixes a failure in the pg_upgrade regression tests due > > to incorrect usage of @abs_builddir@. I had the changes staged, but > > forgot to add them to the patches. > > > > Sorry for the noise. > > Looks like the patch 0001 that was adding tuples_excluded was missing > and cfbot is also not happy with the v5 patch set. > > Maybe, we may not need 6 patches as they are relatively very small > patches. IMO, the following are enough: > > 0001 - tuples_excluded, lines to tuples change, COPY FROM/COPY TO > addition, io_target -- basically all the code related patches can go > into 0001 > 0002 - documentation > 0003 - tests - I think we can only have a simple test(in copy2.sql) > showing stdin/stdout and not have file related tests. Because these > patches work as expected, please find my testing below: I agree with that split, the current split was mainly for the reason that some of the patches (except 1, 3 and 6, which are quite substantially different from the rest) each have had their seperate concerns voiced about the changes contained in that patch (be it direct or indirect); e.g. the renaming of lines_* to tuples_* is in my opionion a good thing, and Josef disagrees. Anyway, please find attached patchset v6 applying that split. Regarding only a simple test: I believe it is useful to have at least a test that distinguishes between two different import types. I've made a mistake before, so I think it is useful to add a regression tests to prevent someone else from making this same mistake (trivial as it may be). Additionally, testing in copy.sql also allows for validating the bytes_total column, which cannot be tested in copy2.sql due to the lack of COPY FROM FILE -support over there. I'm +0.5 on keeping it as-is in copy.sql, so unless someone has some strong feelings about this, I'd like to keep it in copy.sql. With regards, Matthias van de Meent.
Вложения
Hi, I agree with these changes in general - I have a couple minor comment: 1) 0001 - the SGML docs are missing a couple tags - The blocks in copyfrom.cc/copyto.c should be reworked - I don't think we do this in our codebase. Move the variable declarations to the beginning, get rid of the out block. Or something like that. - I fir the "io_target" name misleading, because in some cases it's actually the *source*. 2) 0002 - I believe "each backend ... reports its" (not theirs), right? - This seems more like a generic docs improvement, not quite specific to the COPY progress patch. It's a bit buried, maybe it should be posted separately. OTOH it's pretty small. 3) 0003 - Some whitespace noise, triggering "git am" warnings. - Might be good to briefly explain what the regression test does with the triggers, etc. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
Hi, Thank you all for the suggestions. PFA version 8 of the patchset, in which I have applied most of your comments. Unless explicitly named below, I have applied the suggestions. On Mon, 15 Feb 2021 at 17:07, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > - The blocks in copyfrom.cc/copyto.c should be reworked - I don't think > we do this in our codebase. I saw this being used in (re)index progress reporting, that's where I took inspiration from. It has been fixed in the attached version. > - I fir the "io_target" name misleading, because in some cases it's > actually the *source*. Yes, I was also not quite happy with this, but couldn't find a better one at the point of writing the initial patchset. Would "io_operations", "io_port", "operates_through" or "through" maybe be better? With regards, Matthias van de Meent
Вложения
On 2/18/21 4:46 PM, Matthias van de Meent wrote: > Hi, > > Thank you all for the suggestions. PFA version 8 of the patchset, in > which I have applied most of your comments. Unless explicitly named > below, I have applied the suggestions. > Thanks. > > On Mon, 15 Feb 2021 at 17:07, Tomas Vondra > <tomas.vondra@enterprisedb.com> wrote: >> >> - The blocks in copyfrom.cc/copyto.c should be reworked - I don't think >> we do this in our codebase. > > I saw this being used in (re)index progress reporting, that's where I > took inspiration from. It has been fixed in the attached version. > Hmmm, good point. I haven't looked at the other places reporting progress and I only ever saw this pattern in old code. I kinda dislike these blocks, but admittedly that's rather subjective view. So if other similar places do this when reporting progress, this probably should too. What's your opinion on this? >> - I fir the "io_target" name misleading, because in some cases it's >> actually the *source*. > > Yes, I was also not quite happy with this, but couldn't find a better > one at the point of writing the initial patchset. Would > "io_operations", "io_port", "operates_through" or "through" maybe be > better? > No idea. Let's see if someone has a better proposal ... regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company