Обсуждение: Add LZ4 compression in pg_dump
Вложения
On Fri, Feb 25, 2022 at 12:05:31PM +0000, Georgios wrote: > The first commit does the heavy lifting required for additional compression methods. > It expands testing coverage for the already supported gzip compression. Commit > bf9aa490db introduced cfp in compress_io.{c,h} with the intent of unifying > compression related code and allow for the introduction of additional archive > formats. However, pg_backup_archiver.c was not using that API. This commit > teaches pg_backup_archiver.c about cfp and is using it through out. Thanks for the patch. I have a few high-level comments. + # Do not use --no-sync to give test coverage for data sync. + compression_gzip_directory_format => { + test_key => 'compression', The tests for GZIP had better be split into their own commit, as that's a coverage improvement for the existing code. I was assuming that this was going to be much larger :) +/* Routines that support LZ4 compressed data I/O */ +#ifdef HAVE_LIBLZ4 +static void InitCompressorLZ4(CompressorState *cs); +static void ReadDataFromArchiveLZ4(ArchiveHandle *AH, ReadFunc readF); +static void WriteDataToArchiveLZ4(ArchiveHandle *AH, CompressorState *cs, + const char *data, size_t dLen); +static void EndCompressorLZ4(ArchiveHandle *AH, CompressorState *cs); +#endif Hmm. This is the same set of APIs as ZLIB and NONE to init, read, write and end, but for the LZ4 compressor (NONE has no init/end). Wouldn't it be better to refactor the existing pg_dump code to have a central structure holding all the function definitions in a common structure so as all those function signatures are set in stone in the shape of a catalog of callbacks, making the addition of more compression formats easier? I would imagine that we'd split the code of each compression method into their own file with their own context data. This would lead to a removal of compress_io.c, with its entry points ReadDataFromArchive(), WriteDataToArchive() & co replaced by pointers to each per-compression callback. > Furthermore, compression was chosen based on the value of the level passed > as an argument during the invocation of pg_dump or some hardcoded defaults. This > does not scale for more than one compression methods. Now the method used for > compression can be explicitly requested during command invocation, or set during > hardcoded defaults. Then it is stored in the relevant structs and passed in the > relevant functions, along side compression level which has lost it's special > meaning. The method for compression is not yet stored in the actual archive. > This is done in the next commit which does introduce a new method. That's one thing Robert was arguing about with pg_basebackup, so that would be consistent, and the option set is backward-compatible as far as I get it by reading the code. -- Michael
Вложения
The patch is failing on cfbot/freebsd. http://cfbot.cputube.org/georgios-kokolatos.html Also, I wondered if you'd looked at the "high compression" interfaces in lz4hc.h ? Should pg_dump use that ? On Fri, Feb 25, 2022 at 08:03:40AM -0600, Justin Pryzby wrote: > Thanks for working on this. Your 0001 looks similar to what I did for zstd 1-2 > years ago. > https://commitfest.postgresql.org/32/2888/ > > I rebased and attached the latest patches I had in case they're useful to you. > I'd like to see zstd included in pg_dump eventually, but it was too much work > to shepherd the patches. Now that seems reasonable for pg16. > > With the other compression patches I've worked on, we've used an extra patch > with changes the default to the new compression algorithm, to force cfbot to > exercize the new code. > > Do you know the process with commitfests and cfbot ? > There's also this, which allows running the tests on cirrus before mailing the > patch to the hackers list. > ./src/tools/ci/README
It seems development on this has stalled. If there's no further work happening I guess I'll mark the patch returned with feedback. Feel free to resubmit it to the next CF when there's progress.
On Fri, Mar 25, 2022 at 01:20:47AM -0400, Greg Stark wrote: > It seems development on this has stalled. If there's no further work > happening I guess I'll mark the patch returned with feedback. Feel > free to resubmit it to the next CF when there's progress. Since it's a reasonably large patch (and one that I had myself started before) and it's only been 20some days since (minor) review comments, and since the focus right now is on committing features, and not reviewing new patches, and this patch is new one month ago, and its 0002 not intended for pg15, therefor I'm moving it to the next CF, where I hope to work with its authors to progress it. -- Justin
On Fri, Mar 25, 2022 at 6:22 AM Justin Pryzby <pryzby@telsasoft.com> wrote: > > On Fri, Mar 25, 2022 at 01:20:47AM -0400, Greg Stark wrote: > > It seems development on this has stalled. If there's no further work > > happening I guess I'll mark the patch returned with feedback. Feel > > free to resubmit it to the next CF when there's progress. > > Since it's a reasonably large patch (and one that I had myself started before) > and it's only been 20some days since (minor) review comments, and since the > focus right now is on committing features, and not reviewing new patches, and > this patch is new one month ago, and its 0002 not intended for pg15, therefor > I'm moving it to the next CF, where I hope to work with its authors to progress > it. > Hi Folks, Here is an updated patchset from Georgios, with minor assistance from myself. The comments above should be addressed, but please let us know if there are other things to go over. A functional change in this patchset is when `--compress=none` is passed to pg_dump, it will not compress for directory type (previously, it would use gzip if present). The previous default behavior is retained. - Rachel
Вложения
------- Original Message ------- On Saturday, March 26th, 2022 at 12:13 AM, Rachel Heaton <rachelmheaton@gmail.com> wrote: > On Fri, Mar 25, 2022 at 6:22 AM Justin Pryzby pryzby@telsasoft.com wrote: > > > On Fri, Mar 25, 2022 at 01:20:47AM -0400, Greg Stark wrote: > > > > > It seems development on this has stalled. If there's no further work > > > happening I guess I'll mark the patch returned with feedback. Feel > > > free to resubmit it to the next CF when there's progress. We had some progress yet we didn't want to distract the list with too many emails. Of course, it seemed stalled to the outside observer, yet I simply wanted to set the record straight and say that we are actively working on it. > > > > Since it's a reasonably large patch (and one that I had myself started before) > > and it's only been 20some days since (minor) review comments, and since the > > focus right now is on committing features, and not reviewing new patches, and > > this patch is new one month ago, and its 0002 not intended for pg15, therefor > > I'm moving it to the next CF, where I hope to work with its authors to progress > > it. Thank you. It is much appreciated. We will sent updates when the next commitfest starts in July as to not distract from the 15 work. Then, we can take it from there. > > Hi Folks, > > Here is an updated patchset from Georgios, with minor assistance from myself. > The comments above should be addressed, but please let us know if A small amendment to the above statement. This patchset does not include the refactoring of compress_io suggested by Mr Paquier in the same thread, as it is missing documentation. An updated version will be sent to include those changes on the next commitfest. > there are other things to go over. A functional change in this > patchset is when `--compress=none` is passed to pg_dump, it will not > compress for directory type (previously, it would use gzip if > present). The previous default behavior is retained. > > - Rachel
On Fri, Mar 25, 2022 at 11:43:17PM +0000, gkokolatos@pm.me wrote: > On Saturday, March 26th, 2022 at 12:13 AM, Rachel Heaton <rachelmheaton@gmail.com> wrote: >> Here is an updated patchset from Georgios, with minor assistance from myself. >> The comments above should be addressed, but please let us know if > > A small amendment to the above statement. This patchset does not include the > refactoring of compress_io suggested by Mr Paquier in the same thread, as it is > missing documentation. An updated version will be sent to include those changes > on the next commitfest. The refactoring using callbacks would make the code much cleaner IMO in the long term, with zstd waiting in the queue. Now, I see some pieces of the patch set that could be merged now without waiting for the development cycle of 16 to begin, as of 0001 to add more tests and 0002. I have a question about 0002, actually. What has led you to the conclusion that this code is dead and could be removed? -- Michael
Вложения
On Sat, Mar 26, 2022 at 02:57:50PM +0900, Michael Paquier wrote: > I have a question about 0002, actually. What has led you to the > conclusion that this code is dead and could be removed? See 0001 and the manpage. + 'pg_dump: compression is not supported by tar archive format'); When I submitted a patch to support zstd, I spent awhile trying to make compression work with tar, but it's a significant effort and better done separately.
LZ4F_HEADER_SIZE_MAX isn't defined in old LZ4. I ran into that on an ubuntu LTS, so I don't think it's so old that it shouldn't be handled more gracefully. LZ4 should either have an explicit version check, or else shouldn't depend on that feature (or should define a safe fallback version if the library header doesn't define it). https://packages.ubuntu.com/liblz4-1 0003: typo: of legacy => or legacy There are a large number of ifdefs being added here - it'd be nice to minimize that. basebackup was organized to use separate files, which is one way. $ git grep -c 'ifdef .*LZ4' src/bin/pg_dump/compress_io.c src/bin/pg_dump/compress_io.c:19 In last year's CF entry, I had made a union within CompressorState. LZ4 doesn't need z_streamp (and ztsd will need ZSTD_outBuffer, ZSTD_inBuffer, ZSTD_CStream). 0002: I wonder if you're able to re-use any of the basebackup parsing stuff from commit ffd53659c. You're passing both the compression method *and* level. I think there should be a structure which includes both. In the future, that can also handle additional options. I hope to re-use these same things for wal_compression=method:level. You renamed this: |- COMPR_ALG_LIBZ |-} CompressionAlgorithm; |+ COMPRESSION_GZIP, |+} CompressionMethod; ..But I don't think that's an improvement. If you were to change it, it should say something like PGDUMP_COMPRESS_ZLIB, since there are other compression structs and typedefs. zlib is not idential to gzip, which uses a different header, so in WriteDataToArchive(), LIBZ is correct, and GZIP is incorrect. The cf* changes in pg_backup_archiver could be split out into a separate commit. It's strictly a code simplification - not just preparation for more compression algorithms. The commit message should "See also: bf9aa490db24b2334b3595ee33653bf2fe39208c". The changes in 0002 for cfopen_write seem insufficient: |+ if (compressionMethod == COMPRESSION_NONE) |+ fp = cfopen(path, mode, compressionMethod, 0); | else | { | #ifdef HAVE_LIBZ | char *fname; | | fname = psprintf("%s.gz", path); |- fp = cfopen(fname, mode, compression); |+ fp = cfopen(fname, mode, compressionMethod, compressionLevel); | free_keep_errno(fname); | #else The only difference between the LIBZ and uncompressed case is the file extension, and it'll be the only difference with LZ4 too. So I suggest to first handle the file extension, and the rest of the code path is not conditional on the compression method. I don't think cfopen_write even needs HAVE_LIBZ - can't you handle that in cfopen_internal() ? This patch rejects -Z0, which ought to be accepted: ./src/bin/pg_dump/pg_dump -h /tmp regression -Fc -Z0 |wc pg_dump: error: can only specify -Z/--compress [LEVEL] when method is set Your 0003 patch shouldn't reference LZ4: +#ifndef HAVE_LIBLZ4 + if (*compressionMethod == COMPRESSION_LZ4) + supports_compression = false; +#endif The 0004 patch renames zlibOutSize to outsize - I think the patch series should be constructed such as to minimize the size of the method-specific patches. I say this anticipating also adding support for zstd. The preliminary patches should have all the boring stuff. It would help for reviewing to keep the patches split up, or to enumerate all the boring things that are being renamed (like change OutputContext to cfp, rename zlibOutSize, ...). 0004: The include should use <lz4.h> and not "lz4.h" freebsd/cfbot is failing. I suggested off-list to add an 0099 patch to change LZ4 to the default, to exercise it more on CI. -- Justin
On Sat, Mar 26, 2022 at 11:21:56AM -0500, Justin Pryzby wrote: > You're passing both the compression method *and* level. I think there should > be a structure which includes both. In the future, that can also handle > additional options. I'm not sure if there's anything worth saving, but I did that last year with 0003-Support-multiple-compression-algs-levels-opts.patch I sent a rebased copy off-list. https://www.postgresql.org/message-id/flat/20210104025321.GA9712@telsasoft.com#ca1b9f9d3552c87fa874731cad9d8391 | fatal("not built with LZ4 support"); | fatal("not built with lz4 support"); Please use consistent capitalization of "lz4" - then the compiler can optimize away duplicate strings. > 0004: The include should use <lz4.h> and not "lz4.h" Also, use USE_LZ4 rather than HAVE_LIBLZ4, per 75eae0908.
> On 26 Mar 2022, at 17:21, Justin Pryzby <pryzby@telsasoft.com> wrote: > I suggested off-list to add an 0099 patch to change LZ4 to the default, to > exercise it more on CI. No need to change the defaults in autoconf for that. The CFBot uses the cirrus file in the tree so changing what the job includes can be easily done (assuming the CFBot hasn't changed this recently which I think it hasn't). I used that trick in the NSS patchset to add a completely new job for --with-ssl=nss beside the --with-ssl=openssl job. -- Daniel Gustafsson https://vmware.com/
On Sun, Mar 27, 2022 at 12:37:27AM +0100, Daniel Gustafsson wrote: > > On 26 Mar 2022, at 17:21, Justin Pryzby <pryzby@telsasoft.com> wrote: > > > I suggested off-list to add an 0099 patch to change LZ4 to the default, to > > exercise it more on CI. > > No need to change the defaults in autoconf for that. The CFBot uses the cirrus > file in the tree so changing what the job includes can be easily done (assuming > the CFBot hasn't changed this recently which I think it hasn't). I used that > trick in the NSS patchset to add a completely new job for --with-ssl=nss beside > the --with-ssl=openssl job. I think you misunderstood - I'm suggesting not only to use with-lz4 (which was always true since 93d973494), but to change pg_dump -Fc and -Fd to use LZ4 by default (the same as I suggested for toast_compression, wal_compression, and again in last year's patch to add zstd compression to pg_dump, for which postgres was not ready). @@ -781,6 +807,11 @@ main(int argc, char **argv) compress.alg = COMPR_ALG_LIBZ; compress.level = Z_DEFAULT_COMPRESSION; #endif + +#ifdef USE_ZSTD + compress.alg = COMPR_ALG_ZSTD; // Set default for testing purposes + compress.level = ZSTD_CLEVEL_DEFAULT; +#endif
On Sat, Mar 26, 2022 at 12:22 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > 0002: I wonder if you're able to re-use any of the basebackup parsing stuff > from commit ffd53659c. You're passing both the compression method *and* level. > I think there should be a structure which includes both. In the future, that > can also handle additional options. I hope to re-use these same things for > wal_compression=method:level. Yeah, we should really try to use that infrastructure instead of inventing a bunch of different ways to do it. It might require some renaming here and there, and I'm not sure whether we really want to try to rush all this into the current release, but I think we should find a way to get it done. -- Robert Haas EDB: http://www.enterprisedb.com
On Sun, Mar 27, 2022 at 10:13:00AM -0400, Robert Haas wrote: > On Sat, Mar 26, 2022 at 12:22 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > > 0002: I wonder if you're able to re-use any of the basebackup parsing stuff > > from commit ffd53659c. You're passing both the compression method *and* level. > > I think there should be a structure which includes both. In the future, that > > can also handle additional options. I hope to re-use these same things for > > wal_compression=method:level. > > Yeah, we should really try to use that infrastructure instead of > inventing a bunch of different ways to do it. It might require some > renaming here and there, and I'm not sure whether we really want to > try to rush all this into the current release, but I think we should > find a way to get it done. It seems like something a whole lot like parse_compress_options() should be in common/. Nobody wants to write it again, and I couldn't convince myself to copy it when I looked at using it for wal_compression. Maybe it should take an argument which specifies the default algorithm to use for input of a numeric "level". And reject such input if not specified, since wal_compression has never taken a "level", so it's not useful or desirable to have that default to some new algorithm. I could write this down if you want, although I'm not sure how/if you intend other people to use bc_algorithm and bc_algorithm. I don't think it's important to do for v15, but it seems like it could be done after featue freeze. pg_dump+lz4 is targetting v16, although there's a cleanup patch that could also go in before branching. -- Justin
> On 27 Mar 2022, at 00:51, Justin Pryzby <pryzby@telsasoft.com> wrote: > > On Sun, Mar 27, 2022 at 12:37:27AM +0100, Daniel Gustafsson wrote: >>> On 26 Mar 2022, at 17:21, Justin Pryzby <pryzby@telsasoft.com> wrote: >> >>> I suggested off-list to add an 0099 patch to change LZ4 to the default, to >>> exercise it more on CI. >> >> No need to change the defaults in autoconf for that. The CFBot uses the cirrus >> file in the tree so changing what the job includes can be easily done (assuming >> the CFBot hasn't changed this recently which I think it hasn't). I used that >> trick in the NSS patchset to add a completely new job for --with-ssl=nss beside >> the --with-ssl=openssl job. > > I think you misunderstood - I'm suggesting not only to use with-lz4 (which was > always true since 93d973494), but to change pg_dump -Fc and -Fd to use LZ4 by > default (the same as I suggested for toast_compression, wal_compression, and > again in last year's patch to add zstd compression to pg_dump, for which > postgres was not ready). Right, I clearly misunderstood, thanks for the clarification. -- Daniel Gustafsson https://vmware.com/
On Sun, Mar 27, 2022 at 12:06 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > Maybe it should take an argument which specifies the default algorithm to use > for input of a numeric "level". And reject such input if not specified, since > wal_compression has never taken a "level", so it's not useful or desirable to > have that default to some new algorithm. That sounds odd to me. Wouldn't it be rather confusing if a bare integer meant gzip for one case and lz4 for another? > I could write this down if you want, although I'm not sure how/if you intend > other people to use bc_algorithm and bc_algorithm. I don't think it's > important to do for v15, but it seems like it could be done after featue > freeze. pg_dump+lz4 is targetting v16, although there's a cleanup patch that > could also go in before branching. Well, I think the first thing we should do is get rid of enum WalCompressionMethod and use enum WalCompression instead. They've got the same elements and very similar names, but the WalCompressionMethod ones just have names like COMPRESSION_NONE, which is too generic, whereas WalCompressionMethod uses WAL_COMPRESSION_NONE, which is better. Then I think we should also rename the COMPR_ALG_* constants in pg_dump.h to names like DUMP_COMPRESSION_*. Once we do that we've got rid of all the unprefixed things that purport to be a list of compression algorithms. Then, if people are willing to adopt the syntax that the backup_compression.c/h stuff supports as a project standard (+1 from me) we can go the other way and rename that stuff to be more generic, taking backup out of the name. -- Robert Haas EDB: http://www.enterprisedb.com
On Mon, Mar 28, 2022 at 08:36:15AM -0400, Robert Haas wrote: > Well, I think the first thing we should do is get rid of enum > WalCompressionMethod and use enum WalCompression instead. They've got > the same elements and very similar names, but the WalCompressionMethod > ones just have names like COMPRESSION_NONE, which is too generic, > whereas WalCompressionMethod uses WAL_COMPRESSION_NONE, which is > better. Then I think we should also rename the COMPR_ALG_* constants > in pg_dump.h to names like DUMP_COMPRESSION_*. Once we do that we've > got rid of all the unprefixed things that purport to be a list of > compression algorithms. Yes, having a centralized enum for the compression method would make sense, along with the routines to parse and get the compression method names. At least that would be one step towards more unity in src/common/. > Then, if people are willing to adopt the syntax that the > backup_compression.c/h stuff supports as a project standard (+1 from > me) we can go the other way and rename that stuff to be more generic, > taking backup out of the name. I am not sure about the specification part which is only used by base backups that has no client-server requirements, so option values would still require their own grammar. -- Michael
Вложения
On Sat, Mar 26, 2022 at 01:14:41AM -0500, Justin Pryzby wrote: > See 0001 and the manpage. > > + 'pg_dump: compression is not supported by tar archive format'); > > When I submitted a patch to support zstd, I spent awhile trying to make > compression work with tar, but it's a significant effort and better done > separately. Wow. This stuff is old enough to vote (c3e18804), dead since its introduction. There is indeed an argument for removing that, it is not good to keep around that that has never been stressed and/or used. Upon review, the cleanup done looks correct, as we have never been able to generate .dat.gz files in for a dump in the tar format. + command_fails_like( + [ 'pg_dump', '--compress', '1', '--format', 'tar' ], This addition depending on HAVE_LIBZ is a good thing as a reminder of any work that could be done in 0002. Now that's waiting for 20 years so I would not hold my breath on this support. I think that this could be just applied first, with 0002 on top of it, as a first improvement. + compress_cmd => [ + $ENV{'GZIP_PROGRAM'}, Patch 0001 is missing and update of pg_dump's Makefile to pass down this environment variable to the test scripts, no? + compress_cmd => [ + $ENV{'GZIP_PROGRAM'}, + '-f', [...] + $ENV{'GZIP_PROGRAM'}, + '-k', '-d', -f and -d are available everywhere I looked at, but is -k/--keep a portable choice with a gzip command? I don't see this option in OpenBSD, for one. So this test is going to cause problems on those buildfarm machines, at least. Couldn't this part be replaced by a simple --test to check that what has been compressed is in correct shape? We know that this works, based on our recent experiences with the other tests. -- Michael
Вложения
------- Original Message ------- On Tuesday, March 29th, 2022 at 9:27 AM, Michael Paquier <michael@paquier.xyz> wrote: > On Sat, Mar 26, 2022 at 01:14:41AM -0500, Justin Pryzby wrote: > > See 0001 and the manpage. > > + 'pg_dump: compression is not supported by tar archive format'); > > When I submitted a patch to support zstd, I spent awhile trying to make > > compression work with tar, but it's a significant effort and better done > > separately. > > Wow. This stuff is old enough to vote (c3e18804), dead since its > introduction. There is indeed an argument for removing that, it is > not good to keep around that that has never been stressed and/or > used. Upon review, the cleanup done looks correct, as we have never > been able to generate .dat.gz files in for a dump in the tar format. Correct. My driving force behind it was to ease up the cleanup/refactoring work that follows, by eliminating the callers of the GZ*() macros. > + command_fails_like( > > + [ 'pg_dump', '--compress', '1', '--format', 'tar' ], > This addition depending on HAVE_LIBZ is a good thing as a reminder of > any work that could be done in 0002. Now that's waiting for 20 years > so I would not hold my breath on this support. I think that this > could be just applied first, with 0002 on top of it, as a first > improvement. Excellent, thank you. > + compress_cmd => [ > + $ENV{'GZIP_PROGRAM'}, > Patch 0001 is missing and update of pg_dump's Makefile to pass down > this environment variable to the test scripts, no? Agreed. It was not properly moved forward. Fixed. > + compress_cmd => [ > + $ENV{'GZIP_PROGRAM'}, > + '-f', > [...] > + $ENV{'GZIP_PROGRAM'}, > + '-k', '-d', > -f and -d are available everywhere I looked at, but is -k/--keep a > portable choice with a gzip command? I don't see this option in > OpenBSD, for one. So this test is going to cause problems on those > buildfarm machines, at least. Couldn't this part be replaced by a > simple --test to check that what has been compressed is in correct > shape? We know that this works, based on our recent experiences with > the other tests. I would argue that the simple '--test' will not do in this case, as the TAP tests do need a file named <test>.sql to compare the contents with. This file is generated either directly by pg_dump itself, or by running pg_restore on pg_dump's output. In the case of compression pg_dump will generate a <test>.sql.<compression program suffix> which can not be used in the comparison tests. So the intention of this block, is not to simply test for validity, but to also decompress pg_dump's output for it to be able to be used. I updated the patch to simply remove the '-k' flag. Please find v3 attached. (only 0001 and 0002 are relevant, 0003 and 0004 are only for reference and are currently under active modification). Cheers, //Georgios
Вложения
On Tue, Mar 29, 2022 at 1:03 AM Michael Paquier <michael@paquier.xyz> wrote: > > Then, if people are willing to adopt the syntax that the > > backup_compression.c/h stuff supports as a project standard (+1 from > > me) we can go the other way and rename that stuff to be more generic, > > taking backup out of the name. > > I am not sure about the specification part which is only used by base > backups that has no client-server requirements, so option values would > still require their own grammar. I don't know what you mean by this. I think the specification stuff could be reused in a lot of places. If you can ask for a base backup with zstd:level=3,long=1,fancystuff=yes or whatever we end up with, why not enable exactly the same for every other place that uses compression? I don't know what "client-server requirements" is or what that has to do with this. -- Robert Haas EDB: http://www.enterprisedb.com
On Tue, Mar 29, 2022 at 09:46:27AM +0000, gkokolatos@pm.me wrote: > On Tuesday, March 29th, 2022 at 9:27 AM, Michael Paquier <michael@paquier.xyz> wrote: >> On Sat, Mar 26, 2022 at 01:14:41AM -0500, Justin Pryzby wrote: >> Wow. This stuff is old enough to vote (c3e18804), dead since its >> introduction. There is indeed an argument for removing that, it is >> not good to keep around that that has never been stressed and/or >> used. Upon review, the cleanup done looks correct, as we have never >> been able to generate .dat.gz files in for a dump in the tar format. > > Correct. My driving force behind it was to ease up the cleanup/refactoring > work that follows, by eliminating the callers of the GZ*() macros. Makes sense to me. >> + command_fails_like( >> >> + [ 'pg_dump', '--compress', '1', '--format', 'tar' ], >> This addition depending on HAVE_LIBZ is a good thing as a reminder of >> any work that could be done in 0002. Now that's waiting for 20 years >> so I would not hold my breath on this support. I think that this >> could be just applied first, with 0002 on top of it, as a first >> improvement. > > Excellent, thank you. I have applied the test for --compress and --format=tar, separating it from the rest. While moving on with 0002, I have noticed the following in _StartBlob(): if (AH->compression != 0) sfx = ".gz"; else sfx = ""; Shouldn't this bit also be simplified, adding a fatal() like the other code paths, for safety? >> + compress_cmd => [ >> + $ENV{'GZIP_PROGRAM'}, >> + '-f', >> [...] >> + $ENV{'GZIP_PROGRAM'}, >> + '-k', '-d', >> -f and -d are available everywhere I looked at, but is -k/--keep a >> portable choice with a gzip command? I don't see this option in >> OpenBSD, for one. So this test is going to cause problems on those >> buildfarm machines, at least. Couldn't this part be replaced by a >> simple --test to check that what has been compressed is in correct >> shape? We know that this works, based on our recent experiences with >> the other tests. > > I would argue that the simple '--test' will not do in this case, as the > TAP tests do need a file named <test>.sql to compare the contents with. > This file is generated either directly by pg_dump itself, or by running > pg_restore on pg_dump's output. In the case of compression pg_dump will > generate a <test>.sql.<compression program suffix> which can not be > used in the comparison tests. So the intention of this block, is not to > simply test for validity, but to also decompress pg_dump's output for it > to be able to be used. Ahh, I see, thanks. I would add a comment about that in the area of compression_gzip_plain_format. + my $supports_compression = check_pg_config("#define HAVE_LIBZ 1"); This part could be moved within the if block a couple of lines down. + my $compress_program = $ENV{GZIP_PROGRAM}; It seems to me that it is enough to rely on {compress_cmd}, hence there should be no need for $compress_program, no? It seems to me that we should have a description for compress_cmd at the top of 002_pg_dump.pl (close to "Definition of the pg_dump runs to make"). There is an order dependency with restore_cmd. > I updated the patch to simply remove the '-k' flag. Okay. -- Michael
Вложения
On Tue, Mar 29, 2022 at 09:14:03AM -0400, Robert Haas wrote: > I don't know what you mean by this. I think the specification stuff > could be reused in a lot of places. If you can ask for a base backup > with zstd:level=3,long=1,fancystuff=yes or whatever we end up with, > why not enable exactly the same for every other place that uses > compression? I don't know what "client-server requirements" is or what > that has to do with this. Oh. I think that I got confused here. I saw the backup component in the file name and this has been associated with the client/server choice that can be done in the options of pg_basebackup. But parse_bc_specification() does not include any knowledge about that: pg_basebackup does this job in parse_compress_options(). I agree that it looks possible to reuse that stuff in more places than just base backups. -- Michael
Вложения
------- Original Message ------- On Wednesday, March 30th, 2022 at 7:54 AM, Michael Paquier <michael@paquier.xyz> wrote: > On Tue, Mar 29, 2022 at 09:46:27AM +0000, gkokolatos@pm.me wrote: > > On Tuesday, March 29th, 2022 at 9:27 AM, Michael Paquier michael@paquier.xyz wrote: > > > On Sat, Mar 26, 2022 at 01:14:41AM -0500, Justin Pryzby wrote: > > > + command_fails_like( > > > + [ 'pg_dump', '--compress', '1', '--format', 'tar' ], > > > This addition depending on HAVE_LIBZ is a good thing as a reminder of > > > any work that could be done in 0002. Now that's waiting for 20 years > > > so I would not hold my breath on this support. I think that this > > > could be just applied first, with 0002 on top of it, as a first > > > improvement. > > > > Excellent, thank you. > > I have applied the test for --compress and --format=tar, separating it > from the rest. Thank you. > While moving on with 0002, I have noticed the following in > > _StartBlob(): > if (AH->compression != 0) > sfx = ".gz"; > else > sfx = ""; > > Shouldn't this bit also be simplified, adding a fatal() like the other > code paths, for safety? Agreed. Fixed. > > > + compress_cmd => [ > > > + $ENV{'GZIP_PROGRAM'}, > > > + '-f', > > > [...] > > > + $ENV{'GZIP_PROGRAM'}, > > > + '-k', '-d', > > > -f and -d are available everywhere I looked at, but is -k/--keep a > > > portable choice with a gzip command? I don't see this option in > > > OpenBSD, for one. So this test is going to cause problems on those > > > buildfarm machines, at least. Couldn't this part be replaced by a > > > simple --test to check that what has been compressed is in correct > > > shape? We know that this works, based on our recent experiences with > > > the other tests. > > > > I would argue that the simple '--test' will not do in this case, as the > > TAP tests do need a file named <test>.sql to compare the contents with. > > This file is generated either directly by pg_dump itself, or by running > > pg_restore on pg_dump's output. In the case of compression pg_dump will > > generate a <test>.sql.<compression program suffix> which can not be > > used in the comparison tests. So the intention of this block, is not to > > simply test for validity, but to also decompress pg_dump's output for it > > to be able to be used. > > Ahh, I see, thanks. I would add a comment about that in the area of > compression_gzip_plain_format. Agreed. Comment added. > + my $supports_compression = check_pg_config("#define HAVE_LIBZ 1"); > > This part could be moved within the if block a couple of lines down. I moved it instead out of the for loop above to not have to call it on each iteration. > + my $compress_program = $ENV{GZIP_PROGRAM}; > > It seems to me that it is enough to rely on {compress_cmd}, hence > there should be no need for $compress_program, no? Maybe not. We don't want to the tests to fail if the utility is not installed. That becomes even more evident as more methods are added. However I realized that the presence of the environmental variable does not guarrantee that the utility is actually installed. In the attached, the existance of the utility is based on the return value of system_log(). > It seems to me that we should have a description for compress_cmd at > the top of 002_pg_dump.pl (close to "Definition of the pg_dump runs to > make"). There is an order dependency with restore_cmd. Agreed. Comment added. Cheers, //Georgios
Вложения
On Wed, Mar 30, 2022 at 03:32:55PM +0000, gkokolatos@pm.me wrote: > On Wednesday, March 30th, 2022 at 7:54 AM, Michael Paquier <michael@paquier.xyz> wrote: >> While moving on with 0002, I have noticed the following in >> >> _StartBlob(): >> if (AH->compression != 0) >> sfx = ".gz"; >> else >> sfx = ""; >> >> Shouldn't this bit also be simplified, adding a fatal() like the other >> code paths, for safety? > > Agreed. Fixed. Okay. 0002 looks fine as-is, and I don't mind the extra fatal() calls. These could be asserts but that's not a big deal one way or the other. And the cleanup is now applied. >> + my $compress_program = $ENV{GZIP_PROGRAM}; >> >> It seems to me that it is enough to rely on {compress_cmd}, hence >> there should be no need for $compress_program, no? > > Maybe not. We don't want to the tests to fail if the utility is not > installed. That becomes even more evident as more methods are added. > However I realized that the presence of the environmental variable does > not guarrantee that the utility is actually installed. In the attached, > the existance of the utility is based on the return value of system_log(). Hmm. [.. thinks ..] The thing that's itching me here is that you align the concept of compression with gzip, but that's not going to be true once more compression options are added to pg_dump, and that would make $supports_compression and $compress_program_exists incorrect. Perhaps the right answer would be to rename all that with a suffix like "_gzip" to make a difference? Or would there be enough control with a value of "compression_gzip" instead of "compression" in test_key? +my $compress_program_exists = (system_log("$ENV{GZIP_PROGRAM}", '-h', + '>', '/dev/null') == 0); Do we need this command execution at all? In all the other tests, we rely on a simple "if (!defined $gzip || $gzip eq '');", so we could do the same here. A last thing is that we should perhaps make a clear difference between the check that looks at if the code has been built with zlib and the check for the presence of GZIP_PROGRAM, as it can be useful in some environments to be able to run pg_dump built with zlib, even if the GZIP_PROGRAM command does not exist (I don't think this can be the case, but other tests are flexible). As of now, the patch relies on pg_dump enforcing uncompression if building under --without-zlib even if --compress/-Z is used, but that also means that those compression tests overlap with the existing tests in this case. Wouldn't it be more consistent to check after $supports_compression when executing the dump command for test_key = "compression[_gzip]"? This would mean keeping GZIP_PROGRAM as sole check when executing the compression command. -- Michael
Вложения
On Thursday, March 31st, 2022 at 4:34 AM, Michael Paquier <michael@paquier.xyz> wrote: > On Wed, Mar 30, 2022 at 03:32:55PM +0000, gkokolatos@pm.me wrote: > > On Wednesday, March 30th, 2022 at 7:54 AM, Michael Paquier michael@paquier.xyz wrote: > > Okay. 0002 looks fine as-is, and I don't mind the extra fatal() > calls. These could be asserts but that's not a big deal one way or > the other. And the cleanup is now applied. Thank you very much. > > > + my $compress_program = $ENV{GZIP_PROGRAM}; > > > It seems to me that it is enough to rely on {compress_cmd}, hence > > > there should be no need for $compress_program, no? > > > > Maybe not. We don't want to the tests to fail if the utility is not > > installed. That becomes even more evident as more methods are added. > > However I realized that the presence of the environmental variable does > > not guarrantee that the utility is actually installed. In the attached, > > the existance of the utility is based on the return value of system_log(). > > Hmm. [.. thinks ..] The thing that's itching me here is that you > align the concept of compression with gzip, but that's not going to be > true once more compression options are added to pg_dump, and that > would make $supports_compression and $compress_program_exists > incorrect. Perhaps the right answer would be to rename all that with > a suffix like "_gzip" to make a difference? Or would there be enough > control with a value of "compression_gzip" instead of "compression" in > test_key? I understand the itch. Indeed when LZ4 is added as compression method, this block changes slightly. I went with the minimum amount changed. Please find in 0001 of the attached this variable renamed as $gzip_program_exist. I thought that as prefix it will match better the already used $ENV{GZIP_PROGRAM}. > +my $compress_program_exists = (system_log("$ENV{GZIP_PROGRAM}", '-h', > + '>', '/dev/null') == 0); > > Do we need this command execution at all? In all the other tests, we > rely on a simple "if (!defined $gzip || $gzip eq '');", so we could do > the same here. You are very correct that we are using the simple version, and that is what it was included in the previous versions of the current patch. However, I did notice that the variable is hard-coded in Makefile.global.in and it does not go through configure. By now, gzip is considered an essential package in most installations, and this hard-code makes sense. Though I did remove the utility from my system, (apt remove gzip) and tried the test with the simple "if (!defined $gzip || $gzip eq '');", which predictably failed. For this, I went with the system call, it is not too expensive and is rather reliable. It is true that the rest of the TAP tests that use this, e.g. in pg_basebackup, also failed. There is an argument to go simple and I will be happy to revert to the previous version. > A last thing is that we should perhaps make a clear difference between > the check that looks at if the code has been built with zlib and the > check for the presence of GZIP_PROGRAM, as it can be useful in some > environments to be able to run pg_dump built with zlib, even if the > GZIP_PROGRAM command does not exist (I don't think this can be the > case, but other tests are flexible). You are very correct. We do that already in the current patch. Note that we skip the test only when we specifically have to execute a compression command. Not all compression tests define such command, exactly so that we can test those cases as well. The point of using an external utility program is in order to extend the coverage in previously untested yet supported scenarios, e.g. manual compression of the *.toc files. Also in the case where it will actually skip the compression command because the gzip program is not present, it will execute the pg_dump command first. > As of now, the patch relies on > pg_dump enforcing uncompression if building under --without-zlib even > if --compress/-Z is used, but that also means that those compression > tests overlap with the existing tests in this case. Wouldn't it be > more consistent to check after $supports_compression when executing > the dump command for test_key = "compression[_gzip]"? This would mean > keeping GZIP_PROGRAM as sole check when executing the compression > command. I can see the overlap case. Yet, I understand the test_key as serving different purpose, as it is a key of %tests and %full_runs. I do not expect the database content of the generated dump to change based on which compression method is used. In the next round, I can see one explitcly requesting --compress=none to override defaults. There is a benefit to group the tests for this scenario under the same test_key, i.e. compression. Also there will be cases where if the program exists, yet the codebase is compiled without support for the method. Then compress_cmd or the restore_cmd that follows will fail. For example, in the plain output, if we try to uncompress the generated the test will fail with 'gzip: <filename> not in gzip format'. In the directory format the compress_cmd will compress the *.toc files, but the restore_cmd will fail because it does not build with support for them. In the attached version, I propose that the compression_cmd is converted into a hash. It contains two keys, the program and the arguments. Maybe it is easier to read than before or than simply grabbing the first element of the array. Cheers, //Georgios
Вложения
On Fri, Apr 01, 2022 at 03:06:40PM +0000, gkokolatos@pm.me wrote: > I understand the itch. Indeed when LZ4 is added as compression method, this > block changes slightly. I went with the minimum amount changed. Please find > in 0001 of the attached this variable renamed as $gzip_program_exist. I thought > that as prefix it will match better the already used $ENV{GZIP_PROGRAM}. Hmm. I have spent some time on that, and upon review I really think that we should skip the tests marked as dedicated to the gzip compression entirely if the build is not compiled with this option, rather than letting the code run a dump for nothing in some cases, relying on the default to uncompress the contents in others. In the latter case, it happens that we have already some checks like defaults_custom_format, but you already mentioned that. We should also skip the later parts of the tests if the compression program does not exist as we rely on it, but only if the command does not exist. This will count for LZ4. > I can see the overlap case. Yet, I understand the test_key as serving different > purpose, as it is a key of %tests and %full_runs. I do not expect the database > content of the generated dump to change based on which compression method is used. Contrary to the current LZ4 tests in pg_dump, what we have here is a check for a command-level run and not a data-level check. So what's introduced is a new concept, and we need a new way to control if the tests should be entirely skipped or not, particularly if we finish by not using test_key to make the difference. Perhaps the best way to address that is to have a new keyword in the $runs structure. The attached defines a new compile_option, that can be completed later for new compression methods introduced in the tests. So the idea is to mark all the tests related to compression with the same test_key, and the tests can be skipped depending on what compile_option requires. > In the attached version, I propose that the compression_cmd is converted into > a hash. It contains two keys, the program and the arguments. Maybe it is easier > to read than before or than simply grabbing the first element of the array. Splitting the program and its arguments makes sense. At the end I am finishing with the attached. I also saw an overlap with the addition of --jobs for the directory format vs not using the option, so I have removed the case where --jobs was not used in the directory format. -- Michael
Вложения
------- Original Message ------- On Tuesday, April 5th, 2022 at 3:34 AM, Michael Paquier <michael@paquier.xyz> wrote: > On Fri, Apr 01, 2022 at 03:06:40PM +0000, gkokolatos@pm.me wrote: > Splitting the program and its arguments makes sense. Great. > At the end I am finishing with the attached. I also saw an overlap > with the addition of --jobs for the directory format vs not using the > option, so I have removed the case where --jobs was not used in the > directory format. Thank you. I agree with the attached and I will carry it forward to the rest of the patchset. Cheers, //Georgios > -- > Michael
On Tue, Apr 05, 2022 at 07:13:35AM +0000, gkokolatos@pm.me wrote: > Thank you. I agree with the attached and I will carry it forward to the > rest of the patchset. No need to carry it forward anymore, I think ;) -- Michael
Вложения
------- Original Message ------- On Tuesday, April 5th, 2022 at 12:55 PM, Michael Paquier <michael@paquier.xyz> wrote: > On Tue, Apr 05, 2022 at 07:13:35AM +0000, gkokolatos@pm.me wrote: > No need to carry it forward anymore, I think ;) Thank you for committing! Cheers, //Georgios > -- > Michael
Hi, Will you be able to send a rebased patch for the next CF ? If you update for the review comments I sent in March, I'll plan to do another round of review. On Sat, Mar 26, 2022 at 11:21:56AM -0500, Justin Pryzby wrote: > LZ4F_HEADER_SIZE_MAX isn't defined in old LZ4. > > I ran into that on an ubuntu LTS, so I don't think it's so old that it > shouldn't be handled more gracefully. LZ4 should either have an explicit > version check, or else shouldn't depend on that feature (or should define a > safe fallback version if the library header doesn't define it). > > https://packages.ubuntu.com/liblz4-1 > > 0003: typo: of legacy => or legacy > > There are a large number of ifdefs being added here - it'd be nice to minimize > that. basebackup was organized to use separate files, which is one way. > > $ git grep -c 'ifdef .*LZ4' src/bin/pg_dump/compress_io.c > src/bin/pg_dump/compress_io.c:19 > > In last year's CF entry, I had made a union within CompressorState. LZ4 > doesn't need z_streamp (and ztsd will need ZSTD_outBuffer, ZSTD_inBuffer, > ZSTD_CStream). > > 0002: I wonder if you're able to re-use any of the basebackup parsing stuff > from commit ffd53659c. You're passing both the compression method *and* level. > I think there should be a structure which includes both. In the future, that > can also handle additional options. I hope to re-use these same things for > wal_compression=method:level. > > You renamed this: > > |- COMPR_ALG_LIBZ > |-} CompressionAlgorithm; > |+ COMPRESSION_GZIP, > |+} CompressionMethod; > > ..But I don't think that's an improvement. If you were to change it, it should > say something like PGDUMP_COMPRESS_ZLIB, since there are other compression > structs and typedefs. zlib is not idential to gzip, which uses a different > header, so in WriteDataToArchive(), LIBZ is correct, and GZIP is incorrect. > > The cf* changes in pg_backup_archiver could be split out into a separate > commit. It's strictly a code simplification - not just preparation for more > compression algorithms. The commit message should "See also: > bf9aa490db24b2334b3595ee33653bf2fe39208c". > > The changes in 0002 for cfopen_write seem insufficient: > |+ if (compressionMethod == COMPRESSION_NONE) > |+ fp = cfopen(path, mode, compressionMethod, 0); > | else > | { > | #ifdef HAVE_LIBZ > | char *fname; > | > | fname = psprintf("%s.gz", path); > |- fp = cfopen(fname, mode, compression); > |+ fp = cfopen(fname, mode, compressionMethod, compressionLevel); > | free_keep_errno(fname); > | #else > > The only difference between the LIBZ and uncompressed case is the file > extension, and it'll be the only difference with LZ4 too. So I suggest to > first handle the file extension, and the rest of the code path is not > conditional on the compression method. I don't think cfopen_write even needs > HAVE_LIBZ - can't you handle that in cfopen_internal() ? > > This patch rejects -Z0, which ought to be accepted: > ./src/bin/pg_dump/pg_dump -h /tmp regression -Fc -Z0 |wc > pg_dump: error: can only specify -Z/--compress [LEVEL] when method is set > > Your 0003 patch shouldn't reference LZ4: > +#ifndef HAVE_LIBLZ4 > + if (*compressionMethod == COMPRESSION_LZ4) > + supports_compression = false; > +#endif > > The 0004 patch renames zlibOutSize to outsize - I think the patch series should > be constructed such as to minimize the size of the method-specific patches. I > say this anticipating also adding support for zstd. The preliminary patches > should have all the boring stuff. It would help for reviewing to keep the > patches split up, or to enumerate all the boring things that are being renamed > (like change OutputContext to cfp, rename zlibOutSize, ...). > > 0004: The include should use <lz4.h> and not "lz4.h" > > freebsd/cfbot is failing. > > I suggested off-list to add an 0099 patch to change LZ4 to the default, to > exercise it more on CI. On Sat, Mar 26, 2022 at 01:33:36PM -0500, Justin Pryzby wrote: > On Sat, Mar 26, 2022 at 11:21:56AM -0500, Justin Pryzby wrote: > > You're passing both the compression method *and* level. I think there should > > be a structure which includes both. In the future, that can also handle > > additional options. > > I'm not sure if there's anything worth saving, but I did that last year with > 0003-Support-multiple-compression-algs-levels-opts.patch > I sent a rebased copy off-list. > https://www.postgresql.org/message-id/flat/20210104025321.GA9712@telsasoft.com#ca1b9f9d3552c87fa874731cad9d8391 > > | fatal("not built with LZ4 support"); > | fatal("not built with lz4 support"); > > Please use consistent capitalization of "lz4" - then the compiler can optimize > away duplicate strings. > > > 0004: The include should use <lz4.h> and not "lz4.h" > > Also, use USE_LZ4 rather than HAVE_LIBLZ4, per 75eae0908.
------- Original Message ------- On Sunday, June 26th, 2022 at 5:55 PM, Justin Pryzby <pryzby@telsasoft.com> wrote: > > > Hi, > > Will you be able to send a rebased patch for the next CF ? Thank you for taking an interest in the PR. The plan is indeed to sent a new version. > If you update for the review comments I sent in March, I'll plan to do another > round of review. Thank you. > > On Sat, Mar 26, 2022 at 11:21:56AM -0500, Justin Pryzby wrote: > > > LZ4F_HEADER_SIZE_MAX isn't defined in old LZ4. > > > > I ran into that on an ubuntu LTS, so I don't think it's so old that it > > shouldn't be handled more gracefully. LZ4 should either have an explicit > > version check, or else shouldn't depend on that feature (or should define a > > safe fallback version if the library header doesn't define it). > > > > https://packages.ubuntu.com/liblz4-1 > > > > 0003: typo: of legacy => or legacy > > > > There are a large number of ifdefs being added here - it'd be nice to minimize > > that. basebackup was organized to use separate files, which is one way. > > > > $ git grep -c 'ifdef .*LZ4' src/bin/pg_dump/compress_io.c > > src/bin/pg_dump/compress_io.c:19 > > > > In last year's CF entry, I had made a union within CompressorState. LZ4 > > doesn't need z_streamp (and ztsd will need ZSTD_outBuffer, ZSTD_inBuffer, > > ZSTD_CStream). > > > > 0002: I wonder if you're able to re-use any of the basebackup parsing stuff > > from commit ffd53659c. You're passing both the compression method and level. > > I think there should be a structure which includes both. In the future, that > > can also handle additional options. I hope to re-use these same things for > > wal_compression=method:level. > > > > You renamed this: > > > > |- COMPR_ALG_LIBZ > > |-} CompressionAlgorithm; > > |+ COMPRESSION_GZIP, > > |+} CompressionMethod; > > > > ..But I don't think that's an improvement. If you were to change it, it should > > say something like PGDUMP_COMPRESS_ZLIB, since there are other compression > > structs and typedefs. zlib is not idential to gzip, which uses a different > > header, so in WriteDataToArchive(), LIBZ is correct, and GZIP is incorrect. > > > > The cf* changes in pg_backup_archiver could be split out into a separate > > commit. It's strictly a code simplification - not just preparation for more > > compression algorithms. The commit message should "See also: > > bf9aa490db24b2334b3595ee33653bf2fe39208c". > > > > The changes in 0002 for cfopen_write seem insufficient: > > |+ if (compressionMethod == COMPRESSION_NONE) > > |+ fp = cfopen(path, mode, compressionMethod, 0); > > | else > > | { > > | #ifdef HAVE_LIBZ > > | char *fname; > > | > > | fname = psprintf("%s.gz", path); > > |- fp = cfopen(fname, mode, compression); > > |+ fp = cfopen(fname, mode, compressionMethod, compressionLevel); > > | free_keep_errno(fname); > > | #else > > > > The only difference between the LIBZ and uncompressed case is the file > > extension, and it'll be the only difference with LZ4 too. So I suggest to > > first handle the file extension, and the rest of the code path is not > > conditional on the compression method. I don't think cfopen_write even needs > > HAVE_LIBZ - can't you handle that in cfopen_internal() ? > > > > This patch rejects -Z0, which ought to be accepted: > > ./src/bin/pg_dump/pg_dump -h /tmp regression -Fc -Z0 |wc > > pg_dump: error: can only specify -Z/--compress [LEVEL] when method is set > > > > Your 0003 patch shouldn't reference LZ4: > > +#ifndef HAVE_LIBLZ4 > > + if (*compressionMethod == COMPRESSION_LZ4) > > + supports_compression = false; > > +#endif > > > > The 0004 patch renames zlibOutSize to outsize - I think the patch series should > > be constructed such as to minimize the size of the method-specific patches. I > > say this anticipating also adding support for zstd. The preliminary patches > > should have all the boring stuff. It would help for reviewing to keep the > > patches split up, or to enumerate all the boring things that are being renamed > > (like change OutputContext to cfp, rename zlibOutSize, ...). > > > > 0004: The include should use <lz4.h> and not "lz4.h" > > > > freebsd/cfbot is failing. > > > > I suggested off-list to add an 0099 patch to change LZ4 to the default, to > > exercise it more on CI. > > > On Sat, Mar 26, 2022 at 01:33:36PM -0500, Justin Pryzby wrote: > > > On Sat, Mar 26, 2022 at 11:21:56AM -0500, Justin Pryzby wrote: > > > > > You're passing both the compression method and level. I think there should > > > be a structure which includes both. In the future, that can also handle > > > additional options. > > > > I'm not sure if there's anything worth saving, but I did that last year with > > 0003-Support-multiple-compression-algs-levels-opts.patch > > I sent a rebased copy off-list. > > https://www.postgresql.org/message-id/flat/20210104025321.GA9712@telsasoft.com#ca1b9f9d3552c87fa874731cad9d8391 > > > > | fatal("not built with LZ4 support"); > > | fatal("not built with lz4 support"); > > > > Please use consistent capitalization of "lz4" - then the compiler can optimize > > away duplicate strings. > > > > > 0004: The include should use <lz4.h> and not "lz4.h" > > > > Also, use USE_LZ4 rather than HAVE_LIBLZ4, per 75eae0908. > >
------- Original Message ------- On Sunday, June 26th, 2022 at 5:55 PM, Justin Pryzby <pryzby@telsasoft.com> wrote: > > Hi, > > Will you be able to send a rebased patch for the next CF ? Please find a rebased and heavily refactored patchset. Since parts of this patchset were already committed, I restarted numbering. I am not certain if this is the preferred way. This makes alignment with previous comments a bit harder > If you update for the review comments I sent in March, I'll plan to do another > round of review. I have updated for "some" of the comments. This is not an unwillingness to incorporate those specific comments. Simply this patchset had started to divert heavily already based on comments from Mr. Paquier who had already requested for the APIs to be refactored to use function pointers. This is happening in 0002 of the patchset. 0001 of the patchset is using the new compression.h under common. This patchset should be considered a late draft, as commentary, documentation, and some finer details are not yet finalized; because I am expecting the proposed refactor to receive a wealth of comments. It would be helpful to understand if the proposed direction is something worth to be worked upon, before moving to the finer details. For what is worth, I am the sole author of the current patchset. Cheers, //Georgios > On Sat, Mar 26, 2022 at 11:21:56AM -0500, Justin Pryzby wrote: > > > LZ4F_HEADER_SIZE_MAX isn't defined in old LZ4. > > > > I ran into that on an ubuntu LTS, so I don't think it's so old that it > > shouldn't be handled more gracefully. LZ4 should either have an explicit > > version check, or else shouldn't depend on that feature (or should define a > > safe fallback version if the library header doesn't define it). > > > > https://packages.ubuntu.com/liblz4-1 > > > > 0003: typo: of legacy => or legacy > > > > There are a large number of ifdefs being added here - it'd be nice to minimize > > that. basebackup was organized to use separate files, which is one way. > > > > $ git grep -c 'ifdef .*LZ4' src/bin/pg_dump/compress_io.c > > src/bin/pg_dump/compress_io.c:19 > > > > In last year's CF entry, I had made a union within CompressorState. LZ4 > > doesn't need z_streamp (and ztsd will need ZSTD_outBuffer, ZSTD_inBuffer, > > ZSTD_CStream). > > > > 0002: I wonder if you're able to re-use any of the basebackup parsing stuff > > from commit ffd53659c. You're passing both the compression method and level. > > I think there should be a structure which includes both. In the future, that > > can also handle additional options. I hope to re-use these same things for > > wal_compression=method:level. > > > > You renamed this: > > > > |- COMPR_ALG_LIBZ > > |-} CompressionAlgorithm; > > |+ COMPRESSION_GZIP, > > |+} CompressionMethod; > > > > ..But I don't think that's an improvement. If you were to change it, it should > > say something like PGDUMP_COMPRESS_ZLIB, since there are other compression > > structs and typedefs. zlib is not idential to gzip, which uses a different > > header, so in WriteDataToArchive(), LIBZ is correct, and GZIP is incorrect. > > > > The cf* changes in pg_backup_archiver could be split out into a separate > > commit. It's strictly a code simplification - not just preparation for more > > compression algorithms. The commit message should "See also: > > bf9aa490db24b2334b3595ee33653bf2fe39208c". > > > > The changes in 0002 for cfopen_write seem insufficient: > > |+ if (compressionMethod == COMPRESSION_NONE) > > |+ fp = cfopen(path, mode, compressionMethod, 0); > > | else > > | { > > | #ifdef HAVE_LIBZ > > | char *fname; > > | > > | fname = psprintf("%s.gz", path); > > |- fp = cfopen(fname, mode, compression); > > |+ fp = cfopen(fname, mode, compressionMethod, compressionLevel); > > | free_keep_errno(fname); > > | #else > > > > The only difference between the LIBZ and uncompressed case is the file > > extension, and it'll be the only difference with LZ4 too. So I suggest to > > first handle the file extension, and the rest of the code path is not > > conditional on the compression method. I don't think cfopen_write even needs > > HAVE_LIBZ - can't you handle that in cfopen_internal() ? > > > > This patch rejects -Z0, which ought to be accepted: > > ./src/bin/pg_dump/pg_dump -h /tmp regression -Fc -Z0 |wc > > pg_dump: error: can only specify -Z/--compress [LEVEL] when method is set > > > > Your 0003 patch shouldn't reference LZ4: > > +#ifndef HAVE_LIBLZ4 > > + if (*compressionMethod == COMPRESSION_LZ4) > > + supports_compression = false; > > +#endif > > > > The 0004 patch renames zlibOutSize to outsize - I think the patch series should > > be constructed such as to minimize the size of the method-specific patches. I > > say this anticipating also adding support for zstd. The preliminary patches > > should have all the boring stuff. It would help for reviewing to keep the > > patches split up, or to enumerate all the boring things that are being renamed > > (like change OutputContext to cfp, rename zlibOutSize, ...). > > > > 0004: The include should use <lz4.h> and not "lz4.h" > > > > freebsd/cfbot is failing. > > > > I suggested off-list to add an 0099 patch to change LZ4 to the default, to > > exercise it more on CI. > > > On Sat, Mar 26, 2022 at 01:33:36PM -0500, Justin Pryzby wrote: > > > On Sat, Mar 26, 2022 at 11:21:56AM -0500, Justin Pryzby wrote: > > > > > You're passing both the compression method and level. I think there should > > > be a structure which includes both. In the future, that can also handle > > > additional options. > > > > I'm not sure if there's anything worth saving, but I did that last year with > > 0003-Support-multiple-compression-algs-levels-opts.patch > > I sent a rebased copy off-list. > > https://www.postgresql.org/message-id/flat/20210104025321.GA9712@telsasoft.com#ca1b9f9d3552c87fa874731cad9d8391 > > > > | fatal("not built with LZ4 support"); > > | fatal("not built with lz4 support"); > > > > Please use consistent capitalization of "lz4" - then the compiler can optimize > > away duplicate strings. > > > > > 0004: The include should use <lz4.h> and not "lz4.h" > > > > Also, use USE_LZ4 rather than HAVE_LIBLZ4, per 75eae0908. > >
Вложения
This is a review of 0001. On Tue, Jul 05, 2022 at 01:22:47PM +0000, gkokolatos@pm.me wrote: > Simply this patchset had started to divert > heavily already based on comments from Mr. Paquier who had already requested for > the APIs to be refactored to use function pointers. This is happening in 0002 of > the patchset. I said something about reducing ifdefs, but I'm having trouble finding what Michael said about this ? > > On Sat, Mar 26, 2022 at 11:21:56AM -0500, Justin Pryzby wrote: > > > > > LZ4F_HEADER_SIZE_MAX isn't defined in old LZ4. > > > > > > I ran into that on an ubuntu LTS, so I don't think it's so old that it > > > shouldn't be handled more gracefully. LZ4 should either have an explicit > > > version check, or else shouldn't depend on that feature (or should define a > > > safe fallback version if the library header doesn't define it). > > > https://packages.ubuntu.com/liblz4-1 The constant still seems to be used without defining a fallback or a minimum version. > > > 0003: typo: of legacy => or legacy This is still there > > > You renamed this: > > > > > > |- COMPR_ALG_LIBZ > > > |-} CompressionAlgorithm; > > > |+ COMPRESSION_GZIP, > > > |+} CompressionMethod; > > > > > > ..But I don't think that's an improvement. If you were to change it, it should > > > say something like PGDUMP_COMPRESS_ZLIB, since there are other compression > > > structs and typedefs. zlib is not idential to gzip, which uses a different > > > header, so in WriteDataToArchive(), LIBZ is correct, and GZIP is incorrect. This comment still applies - zlib's gz* functions are "gzip" but the others are "zlib". https://zlib.net/manual.html That affects both the 0001 and 0002 patches. Actually, I think that "gzip" should not be the name of the user-facing option, since (except for "plain" format) it isn't using gzip. +Robert, since this suggests amending parse_compress_algorithm(). Maybe "zlib" should be parsed the same way as "gzip" - I don't think we ever expose both to a user, but in some cases (basebackup and pg_dump -Fp -Z1) the output is "gzip" and in some cases NO it's zlib (pg_dump -Fc -Z1). > > > The cf* changes in pg_backup_archiver could be split out into a separate > > > commit. It's strictly a code simplification - not just preparation for more > > > compression algorithms. The commit message should "See also: > > > bf9aa490db24b2334b3595ee33653bf2fe39208c". I still think this could be an early, 0000 patch. > > > freebsd/cfbot is failing. This is still failing for bsd, windows and compiler warnings. Windows also has compiler warnings. http://cfbot.cputube.org/georgios-kokolatos.html Please see: src/tools/ci/README, which you can use to run check-world on 4 OS by pushing a branch to github. > > > I suggested off-list to add an 0099 patch to change LZ4 to the default, to > > > exercise it more on CI. What about this ? I think the patch needs to pass CI on all 4 OS with default=zlib and default=lz4. > > On Sat, Mar 26, 2022 at 01:33:36PM -0500, Justin Pryzby wrote: > @@ -254,7 +251,12 @@ CreateArchive(const char *FileSpec, const ArchiveFormat fmt, > Archive * > OpenArchive(const char *FileSpec, const ArchiveFormat fmt) > { > - ArchiveHandle *AH = _allocAH(FileSpec, fmt, 0, true, archModeRead, setupRestoreWorker); > + ArchiveHandle *AH; > + pg_compress_specification compress_spec; Should this be initialized to {0} ? > @@ -969,6 +969,8 @@ NewRestoreOptions(void) > opts->format = archUnknown; > opts->cparams.promptPassword = TRI_DEFAULT; > opts->dumpSections = DUMP_UNSECTIONED; > + opts->compress_spec.algorithm = PG_COMPRESSION_NONE; > + opts->compress_spec.level = INT_MIN; Why INT_MIN ? > @@ -1115,23 +1117,28 @@ PrintTOCSummary(Archive *AHX) > ArchiveHandle *AH = (ArchiveHandle *) AHX; > RestoreOptions *ropt = AH->public.ropt; > TocEntry *te; > + pg_compress_specification out_compress_spec; Should have {0} ? I suggest to write it like my 2020 patch for this, which says: no_compression = {0}; > /* Open stdout with no compression for AH output handle */ > - AH->gzOut = 0; > - AH->OF = stdout; > + out_compress_spec.algorithm = PG_COMPRESSION_NONE; > + AH->OF = cfdopen(dup(fileno(stdout)), PG_BINARY_A, out_compress_spec); Ideally this should check the success of dup(). > @@ -3776,21 +3746,25 @@ ReadHead(ArchiveHandle *AH) > + if (AH->compress_spec.level != INT_MIN) Why is it testing the level and not the algorithm ? > --- a/src/bin/pg_dump/pg_backup_custom.c > +++ b/src/bin/pg_dump/pg_backup_custom.c > @@ -298,7 +298,7 @@ _StartData(ArchiveHandle *AH, TocEntry *te) > _WriteByte(AH, BLK_DATA); /* Block type */ > WriteInt(AH, te->dumpId); /* For sanity check */ > > - ctx->cs = AllocateCompressor(AH->compression, _CustomWriteFunc); > + ctx->cs = AllocateCompressor(AH->compress_spec, _CustomWriteFunc); Is it necessary to rename the data structure ? If not, this file can remain unchanged. > --- a/src/bin/pg_dump/pg_backup_directory.c > +++ b/src/bin/pg_dump/pg_backup_directory.c > @@ -573,6 +574,7 @@ _CloseArchive(ArchiveHandle *AH) > if (AH->mode == archModeWrite) > { > cfp *tocFH; > + pg_compress_specification compress_spec; Should use {0} ? > @@ -639,12 +642,14 @@ static void > _StartBlobs(ArchiveHandle *AH, TocEntry *te) > { > lclContext *ctx = (lclContext *) AH->formatData; > + pg_compress_specification compress_spec; Same > + /* > + * Custom and directory formats are compressed by default (zlib), others > + * not > + */ > + if (user_compression_defined == false) Should be: !user_compression_defined Your 0001+0002 patches (without 0003) fail to compile: pg_backup_directory.c: In function ‘_ReadByte’: pg_backup_directory.c:519:12: error: ‘CompressFileHandle’ {aka ‘struct CompressFileHandle’} has no member named ‘_IO_getc’ 519 | return CFH->getc(CFH); | ^~ pg_backup_directory.c:520:1: warning: control reaches end of non-void function [-Wreturn-type] 520 | } -- Justin
On Tue, Jul 05, 2022 at 01:22:47PM +0000, gkokolatos@pm.me wrote: > I have updated for "some" of the comments. This is not an unwillingness to > incorporate those specific comments. Simply this patchset had started to divert > heavily already based on comments from Mr. Paquier who had already requested for > the APIs to be refactored to use function pointers. This is happening in 0002 of > the patchset. 0001 of the patchset is using the new compression.h under common. > > This patchset should be considered a late draft, as commentary, documentation, > and some finer details are not yet finalized; because I am expecting the proposed > refactor to receive a wealth of comments. It would be helpful to understand if > the proposed direction is something worth to be worked upon, before moving to the > finer details. I have read through the patch set, and I like a lot the separation you are doing here with CompressFileHandle where a compression method has to specify a full set of callbacks depending on the actions that need to be taken. One advantage, as you patch shows, is that you reduce the dependency of each code path depending on the compression method, with #ifdefs and such located mostly into their own file structure, so as adding a new compression method becomes really easier. These callbacks are going to require much more documentation to describe what anybody using them should expect from them, and perhaps they could be renamed in a more generic way as the currect names come from POSIX (say read_char(), read_string()?), even if this patch has just inherited the names coming from pg_dump itself, but this can be tuned over and over. The split into three parts as of 0001 to plug into pg_dump the new compression option set, 0002 to introduce the callbacks and 0003 to add LZ4, building on the two first parts, makes sense to me. 0001 and 0002 could be done in a reversed order as they are mostly independent, this order is fine as-is. In short, I am fine with the proposed approach. +#define K_VERS_1_15 MAKE_ARCHIVE_VERSION(1, 15, 0) /* add compressionMethod + * in header */ Indeed, the dump format needs a version bump for this information. +static bool +parse_compression_option(const char *opt, + pg_compress_specification *compress_spec) This parsing logic in pg_dump.c looks a lot like what pg_receivewal.c does with its parse_compress_options() where, for compatibility: - If only a number is given: -- Assume no compression if level is 0. -- Assume gzip with given compression if level > 0. - If a string is found, assume a full spec, with optionally a level. So some consolidation could be done between both. By the way, I can see that GZCLOSE(), etc. are still defined in compress_io.h but they are not used. -- Michael
Вложения
This entry has been waiting on author input for a while (our current threshold is roughly two weeks), so I've marked it Returned with Feedback. Once you think the patchset is ready for review again, you (or any interested party) can resurrect the patch entry by visiting https://commitfest.postgresql.org/38/3571/ and changing the status to "Needs Review", and then changing the status again to "Move to next CF". (Don't forget the second step; hopefully we will have streamlined this in the near future!) Thanks, --Jacob
Thank you for your work during commitfest. The patch is still in development. Given vacation status, expect the next patches to be ready for the November commitfest. For now it has moved to the September one. Further action will be taken then as needed. Enjoy the rest of the summer!
Checking if you'll be able to submit new patches soon ?
Checking if you'll be able to submit new patches soon ?
Thank you for checking up. Expect new versions within this commitfest cycle.
On Fri, Aug 05, 2022 at 02:23:45PM +0000, Georgios Kokolatos wrote: > Thank you for your work during commitfest. > > The patch is still in development. Given vacation status, expect the next patches to be ready for the November commitfest. > For now it has moved to the September one. Further action will be taken then as needed. On Sun, Nov 06, 2022 at 02:53:12PM +0000, gkokolatos@pm.me wrote: > On Wed, Nov 2, 2022 at 14:28, Justin Pryzby <pryzby@telsasoft.com> wrote: > > Checking if you'll be able to submit new patches soon ? > > Thank you for checking up. Expect new versions within this commitfest cycle. Hi, I think this patch record should be closed for now. You can re-open the existing patch record once a patch is ready to be reviewed. The commitfest is a time for committing/reviewing patches that were previously submitted, but there's no new patch since July. Making a patch available for review at the start of the commitfest seems like a requirement for current patch records (same as for new patch records). I wrote essentially the same patch as your early patches 2 years ago (before postgres was ready to consider new compression algorithms), so I'm happy to review a new patch when it's available, regardless of its status in the cfapp. BTW, some of my own review comments from March weren't addressed. Please check. Also, in February, I asked if you knew how to use cirrusci to run checks on cirrusci, but the patches still had compilation errors and warnings on various OS. https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/39/3571 -- Justin
On Sun, Nov 20, 2022 at 11:26:11AM -0600, Justin Pryzby wrote: > I think this patch record should be closed for now. You can re-open the > existing patch record once a patch is ready to be reviewed. Indeed. As of things are, this is just a dead entry in the CF which would be confusing. I have marked it as RwF. -- Michael
Вложения
------- Original Message ------- On Monday, November 21st, 2022 at 12:13 AM, Michael Paquier <michael@paquier.xyz> wrote: > > > On Sun, Nov 20, 2022 at 11:26:11AM -0600, Justin Pryzby wrote: > > > I think this patch record should be closed for now. You can re-open the > > existing patch record once a patch is ready to be reviewed. > > > Indeed. As of things are, this is just a dead entry in the CF which > would be confusing. I have marked it as RwF. Thank you for closing it. For the record I am currently working on it simply unsure if I should submit WIP patches and add noise to the list or wait until it is in a state that I feel that the comments have been addressed. A new version that I feel that is in a decent enough state for review should be ready within this week. I am happy to drop the patch if you think I should not work on it though. Cheers, //Georgios > -- > Michael
On Tue, Nov 22, 2022 at 10:00:47AM +0000, gkokolatos@pm.me wrote: > A new version that I feel that is in a decent enough state for review should > be ready within this week. I am happy to drop the patch if you think I should > not work on it though. If you can post a new version of the patch, that's fine, of course. I'll be happy to look over it more. -- Michael
Вложения
On Tue, Nov 22, 2022 at 10:00:47AM +0000, gkokolatos@pm.me wrote: > For the record I am currently working on it simply unsure if I should submit > WIP patches and add noise to the list or wait until it is in a state that I > feel that the comments have been addressed. > > A new version that I feel that is in a decent enough state for review should > be ready within this week. I am happy to drop the patch if you think I should > not work on it though. I hope you'll want to continue work on it. The patch record is like a request for review, so it's closed if there's nothing ready to review. I think you should re-send patches (and update the CF app) as often as they're ready for more review. Your 001 commit (which is almost the same as what I wrote 2 years ago) still needs to account for some review comments, and the whole patch set ought to pass cirrusci tests. At that point, you'll be ready for another round of review, even if there's known TODO/FIXME items in later patches. BTW I saw that you updated your branch on github. You'll need to make the corresponding changes to ./meson.build that you made to ./Makefile. https://wiki.postgresql.org/wiki/Meson_for_patch_authors https://wiki.postgresql.org/wiki/Meson -- Justin
------- Original Message ------- On Tuesday, November 22nd, 2022 at 11:49 AM, Michael Paquier <michael@paquier.xyz> wrote: > > > On Tue, Nov 22, 2022 at 10:00:47AM +0000, gkokolatos@pm.me wrote: > > > A new version that I feel that is in a decent enough state for review should > > be ready within this week. I am happy to drop the patch if you think I should > > not work on it though. > > > If you can post a new version of the patch, that's fine, of course. > I'll be happy to look over it more. Thank you Michael (and Justin). Allow me to present v8. The focus of this version of this series is 0001 and 0002. Admittedly 0001 could be presented in a separate thread though given its size and proximity to the topic, I present it here. In an earlier review you spotted the similarity between pg_dump's and pg_receivewal's parsing of compression options. However there exists a substantial difference in the behaviour of the two programs; one treats the lack of support for the requested algorithm as a fatal error, whereas the other does not. The existing functions in common/compression.c do not account for the later. 0002 proposes an implementation for this. It's usefulness is shown in 0003. Please consider 0003-0005 as work in progress. They are differences from v7 yet they may contain unaddressed comments for now. A welcome feedback would be in splitting and/or reordering of 0003-0005. I think that they now split in coherent units and are presented in a logical order. Let me know if you disagree and where should the breakpoints be. Cheers, //Georgios > -- > Michael
Вложения
On Mon, Nov 28, 2022 at 04:32:43PM +0000, gkokolatos@pm.me wrote: > The focus of this version of this series is 0001 and 0002. > > Admittedly 0001 could be presented in a separate thread though given its size and > proximity to the topic, I present it here. I don't mind. This was a hole in meson.build, so nice catch! I have noticed a second defect with pg_verifybackup for all the commands, and applied both at the same time. > In an earlier review you spotted the similarity between pg_dump's and pg_receivewal's > parsing of compression options. However there exists a substantial difference in the > behaviour of the two programs; one treats the lack of support for the requested > algorithm as a fatal error, whereas the other does not. The existing functions in > common/compression.c do not account for the later. 0002 proposes an implementation > for this. It's usefulness is shown in 0003. In what does it matter? The logic in compression.c provides an error when looking at a spec or validating it, but the caller is free to consume it as it wants because this is shared between the frontend and the backend, and that includes consuming it as a warning rather than a ahrd failure. If we don't want to issue an error and force non-compression if attempting to use a compression method not supported in pg_dump, that's fine by me as a historical behavior, but I don't see why these routines have any need to be split more as proposed in 0002. Saying that, I do agree that it would be nice to remove the duplication between the option parsing of pg_basebackup and pg_receivewal. Your patch is very close to that, actually, and it occured to me that if we move the check on "server-" and "client-" in pg_basebackup to be just before the integer-only check then we can consolidate the whole thing. Attached is an alternative that does not sacrifice the pluggability of the existing routines while allowing 0003~ to still use them (I don't really want to move around the checks on the supported build options now in parse_compress_specification(), that was hard enough to settle on this location). On top of that, pg_basebackup is able to cope with the case of --compress=0 already, enforcing "none" (BaseBackup could be simplified a bit more before StartLogStreamer). This refactoring shaves a little bit of code. > Please consider 0003-0005 as work in progress. They are differences from v7 yet they > may contain unaddressed comments for now. Okay. -- Michael
Вложения
On Tue, Nov 29, 2022 at 03:19:17PM +0900, Michael Paquier wrote: > Attached is an alternative that does not sacrifice the pluggability of > the existing routines while allowing 0003~ to still use them (I don't > really want to move around the checks on the supported build options > now in parse_compress_specification(), that was hard enough to settle > on this location). On top of that, pg_basebackup is able to cope with > the case of --compress=0 already, enforcing "none" (BaseBackup could > be simplified a bit more before StartLogStreamer). This refactoring > shaves a little bit of code. One thing that I forgot to mention is that this refactoring would treat things like server-N, client-N as valid grammars (in this case N takes precedence over an optional detail string), implying that N = 0 is "none" and N > 0 is gzip, so that makes for an extra grammar flavor without impacting the existing ones. I am not sure that it is worth documenting, still worth mentioning. -- Michael
Вложения
------- Original Message ------- On Tuesday, November 29th, 2022 at 7:19 AM, Michael Paquier <michael@paquier.xyz> wrote: > > > On Mon, Nov 28, 2022 at 04:32:43PM +0000, gkokolatos@pm.me wrote: > > > The focus of this version of this series is 0001 and 0002. > > > > Admittedly 0001 could be presented in a separate thread though given its size and > > proximity to the topic, I present it here. > > > I don't mind. This was a hole in meson.build, so nice catch! I have > noticed a second defect with pg_verifybackup for all the commands, and > applied both at the same time. Thank you. > > > In an earlier review you spotted the similarity between pg_dump's and pg_receivewal's > > parsing of compression options. However there exists a substantial difference in the > > behaviour of the two programs; one treats the lack of support for the requested > > algorithm as a fatal error, whereas the other does not. The existing functions in > > common/compression.c do not account for the later. 0002 proposes an implementation > > for this. It's usefulness is shown in 0003. > > > In what does it matter? The logic in compression.c provides an error > when looking at a spec or validating it, but the caller is free to > consume it as it wants because this is shared between the frontend and > the backend, and that includes consuming it as a warning rather than a > ahrd failure. If we don't want to issue an error and force > non-compression if attempting to use a compression method not > supported in pg_dump, that's fine by me as a historical behavior, but > I don't see why these routines have any need to be split more as > proposed in 0002. I understand. The reason for the change in the routines was because it was impossible to distinguish a genuine parse error from a missing library in parse_compress_specification(). If the zlib library is missing, then both '--compress=gzip:garbage' and '--compress=gzip:7' would populate the parse_error member of the struct and subsequent calls to validate_compress_specification() would error out, although only one of the two options is truly an error. Historically the code would fail on invalid input regardless of whether the library was present or not. > Saying that, I do agree that it would be nice to remove the > duplication between the option parsing of pg_basebackup and > pg_receivewal. Your patch is very close to that, actually, and it > occured to me that if we move the check on "server-" and "client-" in > pg_basebackup to be just before the integer-only check then we can > consolidate the whole thing. Great. I did notice the possible benefit but chose to not tread too far off the necessary in my patch. > Attached is an alternative that does not sacrifice the pluggability of > the existing routines while allowing 0003~ to still use them (I don't > really want to move around the checks on the supported build options > now in parse_compress_specification(), that was hard enough to settle > on this location). Yeah, I thought that it would be a hard sell, hence an "earlier" version. The attached version 10, contains verbatim your proposed v9 as 0001. Then 0002 is switching a bit the parsing order in pg_dump and will not fail as described above on missing libraries. Now, it will first parse the algorithm, discard it when unsupported, and only parse the rest of the option if the algorithm is supported. Granted it is a bit 'uglier' with the preprocessing blocks, yet it maintains most of the historic behaviour without altering the common compression interfaces. Now, as shown in 001_basic.pl, invalid detail will fail only if the algorithm is supported. > On top of that, pg_basebackup is able to cope with > the case of --compress=0 already, enforcing "none" (BaseBackup could > be simplified a bit more before StartLogStreamer). This refactoring > shaves a little bit of code. > > > Please consider 0003-0005 as work in progress. They are differences from v7 yet they > > may contain unaddressed comments for now. > > > Okay. Thank you. Please advice if is preferable to split 0002 in two parts. I think not but I will happily do so if you think otherwise. Cheers, //Georgios > -- > Michael
Вложения
On Tue, Nov 29, 2022 at 12:10:46PM +0000, gkokolatos@pm.me wrote: > Thank you. Please advice if is preferable to split 0002 in two parts. > I think not but I will happily do so if you think otherwise. This one makes me curious. What kind of split are you talking about? If it makes the code review and the git history cleaner and easier, I am usually a lot in favor of such incremental changes. As far as I can see, there is the switch from the compression integer to compression specification as one thing. The second thing is the refactoring of cfclose() and these routines, paving the way for 0003. Hmm, it may be cleaner to move the switch to the compression spec in one patch, and move the logic around cfclose() to its own, paving the way to 0003. By the way, I think that this 0002 should drop all the default clauses in the switches for the compression method so as we'd catch any missing code paths with compiler warnings if a new compression method is added in the future. Anyway, I have applied 0001, adding you as a primary author because you did most of it with only tweaks from me for pg_basebackup. The docs of pg_basebackup have been amended to mention the slight change in grammar, affecting the case where we do not have a detail string. -- Michael
Вложения
------- Original Message ------- On Wednesday, November 30th, 2022 at 1:50 AM, Michael Paquier <michael@paquier.xyz> wrote: > > > On Tue, Nov 29, 2022 at 12:10:46PM +0000, gkokolatos@pm.me wrote: > > > Thank you. Please advice if is preferable to split 0002 in two parts. > > I think not but I will happily do so if you think otherwise. > > > This one makes me curious. What kind of split are you talking about? > If it makes the code review and the git history cleaner and easier, I > am usually a lot in favor of such incremental changes. As far as I > can see, there is the switch from the compression integer to > compression specification as one thing. The second thing is the > refactoring of cfclose() and these routines, paving the way for 0003. > Hmm, it may be cleaner to move the switch to the compression spec in > one patch, and move the logic around cfclose() to its own, paving the > way to 0003. Fair enough. The atteched v11 does that. 0001 introduces compression specification and is using it throughout. 0002 paves the way to the new interface by homogenizing the use of cfp. 0003 introduces the new API and stores the compression algorithm in the custom format header instead of the compression level integer. Finally 0004 adds support for LZ4. Besides the version bump in 0003 which can possibly be split out and as an independent and earlier step, I think that the patchset consists of coherent units. > By the way, I think that this 0002 should drop all the default clauses > in the switches for the compression method so as we'd catch any > missing code paths with compiler warnings if a new compression method > is added in the future. Sure. > Anyway, I have applied 0001, adding you as a primary author because > you did most of it with only tweaks from me for pg_basebackup. The > docs of pg_basebackup have been amended to mention the slight change > in grammar, affecting the case where we do not have a detail string. Very kind of you, thank you. Cheers, //Georgios > -- > Michael
Вложения
On Wed, Nov 30, 2022 at 05:11:44PM +0000, gkokolatos@pm.me wrote: > Fair enough. The atteched v11 does that. 0001 introduces compression > specification and is using it throughout. 0002 paves the way to the > new interface by homogenizing the use of cfp. 0003 introduces the new > API and stores the compression algorithm in the custom format header > instead of the compression level integer. Finally 0004 adds support for > LZ4. I have been looking at 0001, and.. Hmm. I am really wondering whether it would not be better to just nuke this warning into orbit. This stuff enforces non-compression even if -Z has been used to a non-default value. This has been moved to its current location by cae2bb1 as of this thread: https://www.postgresql.org/message-id/20160526.185551.242041780.horiguchi.kyotaro%40lab.ntt.co.jp However, this is only active if -Z is used when not building with zlib. At the end, it comes down to whether we want to prioritize the portability of pg_dump commands specifying a -Z/--compress across environments knowing that these may or may not be built with zlib, vs the amount of simplification/uniformity we would get across the binaries in the tree once we switch everything to use the compression specifications. Now that pg_basebackup and pg_receivewal are managed by compression specifications, and that we'd want more compression options for pg_dump, I would tend to do the latter and from now on complain if attempting to do a pg_dump -Z under --without-zlib with a compression level > 0. zlib is also widely available, and we don't document the fact that non-compression is enforced in this case, either. (Two TAP tests with the custom format had to be tweaked.) As per the patch, it is true that we do not need to bump the format of the dump archives, as we can still store only the compression level and guess the method from it. I have added some notes about that in ReadHead and WriteHead to not forget. Most of the changes are really-straight forward, and it has resisted my tests, so I think that this is in a rather-commitable shape as-is. -- Michael
Вложения
------- Original Message ------- On Thursday, December 1st, 2022 at 3:05 AM, Michael Paquier <michael@paquier.xyz> wrote: > > > On Wed, Nov 30, 2022 at 05:11:44PM +0000, gkokolatos@pm.me wrote: > > > Fair enough. The atteched v11 does that. 0001 introduces compression > > specification and is using it throughout. 0002 paves the way to the > > new interface by homogenizing the use of cfp. 0003 introduces the new > > API and stores the compression algorithm in the custom format header > > instead of the compression level integer. Finally 0004 adds support for > > LZ4. > > > I have been looking at 0001, and.. Hmm. I am really wondering > whether it would not be better to just nuke this warning into orbit. > This stuff enforces non-compression even if -Z has been used to a > non-default value. This has been moved to its current location by > cae2bb1 as of this thread: > https://www.postgresql.org/message-id/20160526.185551.242041780.horiguchi.kyotaro%40lab.ntt.co.jp > > However, this is only active if -Z is used when not building with > zlib. At the end, it comes down to whether we want to prioritize the > portability of pg_dump commands specifying a -Z/--compress across > environments knowing that these may or may not be built with zlib, > vs the amount of simplification/uniformity we would get across the > binaries in the tree once we switch everything to use the compression > specifications. Now that pg_basebackup and pg_receivewal are managed > by compression specifications, and that we'd want more compression > options for pg_dump, I would tend to do the latter and from now on > complain if attempting to do a pg_dump -Z under --without-zlib with a > compression level > 0. zlib is also widely available, and we don't > document the fact that non-compression is enforced in this case, > either. (Two TAP tests with the custom format had to be tweaked.) Fair enough. Thank you for looking. However I have a small comment on your new patch. - /* Custom and directory formats are compressed by default, others not */ - if (compressLevel == -1) - { -#ifdef HAVE_LIBZ - if (archiveFormat == archCustom || archiveFormat == archDirectory) - compressLevel = Z_DEFAULT_COMPRESSION; - else -#endif - compressLevel = 0; - } Nuking the warning from orbit and changing the behaviour around disabling the requested compression when the libraries are not present, should not mean that we need to change the behaviour of default values for different formats. Please find v13 attached which reinstates it. Which in itself it got me looking and wondering why the tests succeeded. The only existing test covering that path is `defaults_dir_format` in `002_pg_dump.pl`. However as the test is currently written it does not check whether the output was compressed. The restore command would succeed in either case. A simple `gzip -t -r` against the directory will not suffice to test it, because there exist files which are never compressed in this format (.toc). A little bit more involved test case would need to be written, yet before I embark to this journey, I would like to know if you would agree to reinstate the defaults for those formats. > > As per the patch, it is true that we do not need to bump the format of > the dump archives, as we can still store only the compression level > and guess the method from it. I have added some notes about that in > ReadHead and WriteHead to not forget. Agreed. A minor suggestion if you may. #ifndef HAVE_LIBZ - if (AH->compression != 0) + if (AH->compression_spec.algorithm == PG_COMPRESSION_GZIP) pg_log_warning("archive is compressed, but this installation does not support compression -- no data willbe available"); #endif It would seem a more consistent to error out in this case. We do error in all other cases where the compression is not available. > > Most of the changes are really-straight forward, and it has resisted > my tests, so I think that this is in a rather-commitable shape as-is. Thank you. Cheers, //Georgios > -- > Michael
Вложения
On Thu, Dec 01, 2022 at 02:58:35PM +0000, gkokolatos@pm.me wrote: > Nuking the warning from orbit and changing the behaviour around disabling > the requested compression when the libraries are not present, should not > mean that we need to change the behaviour of default values for different > formats. Please find v13 attached which reinstates it. Gah, thanks! And this default behavior is documented as dependent on the compilation as well. > Which in itself it got me looking and wondering why the tests succeeded. > The only existing test covering that path is `defaults_dir_format` in > `002_pg_dump.pl`. However as the test is currently written it does not > check whether the output was compressed. The restore command would succeed > in either case. A simple `gzip -t -r` against the directory will not > suffice to test it, because there exist files which are never compressed > in this format (.toc). A little bit more involved test case would need > to be written, yet before I embark to this journey, I would like to know > if you would agree to reinstate the defaults for those formats. On top of my mind, I briefly recall that -r is not that portable. And the toc format makes the files generated non-deterministic as these use OIDs.. [.. thinks ..] We are going to need a new thing here, as compress_cmd cannot be directly used. What if we used only an array of glob()-able elements? Let's say "expected_contents" that could include a "dir_path/*.gz" conditional on $supports_gzip? glob() can only be calculated when the test is run as the file names cannot be known beforehand :/ >> As per the patch, it is true that we do not need to bump the format of >> the dump archives, as we can still store only the compression level >> and guess the method from it. I have added some notes about that in >> ReadHead and WriteHead to not forget. > > Agreed. A minor suggestion if you may. > > #ifndef HAVE_LIBZ > - if (AH->compression != 0) > + if (AH->compression_spec.algorithm == PG_COMPRESSION_GZIP) > pg_log_warning("archive is compressed, but this installation does not support compression -- no data willbe available"); > #endif > > It would seem a more consistent to error out in this case. We do error > in all other cases where the compression is not available. Makes sense. I have gone through the patch again, and applied it. Thanks! -- Michael
Вложения
------- Original Message ------- On Friday, December 2nd, 2022 at 2:56 AM, Michael Paquier <michael@paquier.xyz> wrote: > On top of my mind, I briefly recall that -r is not that portable. And > the toc format makes the files generated non-deterministic as these > use OIDs.. > > [.. thinks ..] > > We are going to need a new thing here, as compress_cmd cannot be > directly used. What if we used only an array of glob()-able elements? > Let's say "expected_contents" that could include a "dir_path/*.gz" > conditional on $supports_gzip? glob() can only be calculated when the > test is run as the file names cannot be known beforehand :/ You are very correct. However one can glob after the fact. Please find 0001 of the attached v14 which attempts to implement it. > I have gone through the patch again, and applied it. Thanks! Thank you. Please find the rest of of the patchset series rebased on top of it. I dare to say that 0002 is in a state worth of your consideration. Cheers, //Georgios > -- > Michael
Вложения
On Fri, Dec 02, 2022 at 04:15:10PM +0000, gkokolatos@pm.me wrote: > You are very correct. However one can glob after the fact. Please find > 0001 of the attached v14 which attempts to implement it. + if ($pgdump_runs{$run}->{glob_pattern}) + { + my $glob_pattern = $pgdump_runs{$run}->{glob_pattern}; + my @glob_output = glob($glob_pattern); + is(scalar(@glob_output) > 0, 1, "glob pattern matched") + } While this is correct in checking that the contents are compressed under --with-zlib, this also removes the coverage where we make sure that this command is able to complete under --without-zlib without compressing any of the table data files. Hence my point from upthread: this test had better not use compile_option, but change glob_pattern depending on if the build uses zlib or not. In order to check this behavior with defaults_custom_format, perhaps we could just remove the -Z6 from it or add an extra command for its default behavior? -- Michael
Вложения
On Sat, Dec 03, 2022 at 11:45:30AM +0900, Michael Paquier wrote: > While this is correct in checking that the contents are compressed > under --with-zlib, this also removes the coverage where we make sure > that this command is able to complete under --without-zlib without > compressing any of the table data files. Hence my point from > upthread: this test had better not use compile_option, but change > glob_pattern depending on if the build uses zlib or not. In short, I mean something like the attached. I have named the flag content_patterns, and switched it to an array so as we can check that toc.dat is always uncompression and that the other data files are always uncompressed. > In order to check this behavior with defaults_custom_format, perhaps > we could just remove the -Z6 from it or add an extra command for its > default behavior? This is slightly more complicated as there is just one file generated for the compression and non-compression cases, so I have let that as it is now. -- Michael
Вложения
------- Original Message ------- On Monday, December 5th, 2022 at 8:05 AM, Michael Paquier <michael@paquier.xyz> wrote: > > > On Sat, Dec 03, 2022 at 11:45:30AM +0900, Michael Paquier wrote: > > > While this is correct in checking that the contents are compressed > > under --with-zlib, this also removes the coverage where we make sure > > that this command is able to complete under --without-zlib without > > compressing any of the table data files. Hence my point from > > upthread: this test had better not use compile_option, but change > > glob_pattern depending on if the build uses zlib or not. > > In short, I mean something like the attached. I have named the flag > content_patterns, and switched it to an array so as we can check that > toc.dat is always uncompression and that the other data files are > always uncompressed. I see. This approach is much better than my proposal, thanks. If you allow me, I find 'content_patterns' to be slightly ambiguous. While is true that it refers to the contents of a directory, it is not the contents of the dump that it is examining. I took the liberty of proposing an alternative name in the attached v16. I also took the liberty of applying the test pattern when it the dump is explicitly compressed. > > In order to check this behavior with defaults_custom_format, perhaps > > we could just remove the -Z6 from it or add an extra command for its > > default behavior? > > This is slightly more complicated as there is just one file generated > for the compression and non-compression cases, so I have let that as > it is now. I was thinking a bit more about this. I think that we can use the list TOC option of pg_restore. This option will first print out the header info which contains the compression. Perl utils already support to parse the generated output of a command. Please find an attempt to do so in the attached. The benefits of having some testing for this case become a bit more obvious in 0004 of the patchset, when lz4 is introduced. Cheers, //Georgios > -- > Michael
Вложения
On Mon, Dec 05, 2022 at 12:48:28PM +0000, gkokolatos@pm.me wrote: > I also took the liberty of applying the test pattern when it the dump > is explicitly compressed. Sticking with glob_patterns is fine by me. > I was thinking a bit more about this. I think that we can use the list > TOC option of pg_restore. This option will first print out the header > info which contains the compression. Perl utils already support to > parse the generated output of a command. Please find an attempt to do > so in the attached. The benefits of having some testing for this case > become a bit more obvious in 0004 of the patchset, when lz4 is > introduced. This is where the fun is. What you are doing here is more complete, and we would make sure that the custom and data directory would always see their contents compressed by default. And it would have caught the bug you mentioned upthread for the custom format. I have kept things as you proposed at the end, added a few comments, documented the new command_like and an extra command_like for defaults_dir_format. Glad to see this addressed, thanks! -- Michael
Вложения
------- Original Message ------- On Tuesday, December 6th, 2022 at 1:22 AM, Michael Paquier <michael@paquier.xyz> wrote: > > > On Mon, Dec 05, 2022 at 12:48:28PM +0000, gkokolatos@pm.me wrote: > > This is where the fun is. What you are doing here is more complete, > and we would make sure that the custom and data directory would always > see their contents compressed by default. And it would have caught > the bug you mentioned upthread for the custom format. Thank you very much Michael. > I have kept things as you proposed at the end, added a few comments, > documented the new command_like and an extra command_like for > defaults_dir_format. Glad to see this addressed, thanks! Please find attached v17, which builds on top of what is already committed. I dare to think 0001 as ready to be reviewed. 0002 is also complete albeit with some documentation gaps. Cheers, //Georgios > -- > Michael
Вложения
001: still refers to "gzip", which is correct for -Fp and -Fd but not for -Fc, for which it's more correct to say "zlib". That affects the name of the function, structures, comments, etc. I'm not sure if it's an issue to re-use the basebackup compression routines here. Maybe we should accept "-Zn" for zlib output (-Fc), but reject "gzip:9", which I'm sure some will find confusing, as it does not output. Maybe 001 should be split into a patch to re-use the existing "cfp" interface (which is a clear win), and 002 to re-use the basebackup interfaces for user input and constants, etc. 001 still doesn't compile on freebsd, and 002 doesn't compile on windows. Have you checked test results from cirrusci on your private github account ? 002 says: + save_errno = errno; + errno = save_errno; I suppose that's intended to wrap the preceding library call. 002 breaks "pg_dump -Fc -Z2" because (I think) AllocateCompressor() doesn't store the passed-in compression_spec. 003 still uses <lz4.h> and not "lz4.h". Earlier this year I also suggested to include an 999 patch to change to use LZ4 as the default compression, to exercise the new code under CI. I suggest to re-open the cf patch entry after that passes tests on all platforms and when it's ready for more review. BTW, some of these review comments are the same as what I sent earlier this year. https://www.postgresql.org/message-id/20220326162156.GI28503%40telsasoft.com https://www.postgresql.org/message-id/20220705151328.GQ13040%40telsasoft.com -- Justin
On Sat, Dec 17, 2022 at 05:26:15PM -0600, Justin Pryzby wrote: > 001: still refers to "gzip", which is correct for -Fp and -Fd but not > for -Fc, for which it's more correct to say "zlib". Or should we begin by changing all these existing "not built with zlib support" error strings to the more generic "this build does not support compression with %s" to reduce the number of messages to translate? That would bring consistency with the other tools dealing with compression. > That affects the > name of the function, structures, comments, etc. I'm not sure if it's > an issue to re-use the basebackup compression routines here. Maybe we > should accept "-Zn" for zlib output (-Fc), but reject "gzip:9", which > I'm sure some will find confusing, as it does not output. Maybe 001 > should be split into a patch to re-use the existing "cfp" interface > (which is a clear win), and 002 to re-use the basebackup interfaces for > user input and constants, etc. > > 001 still doesn't compile on freebsd, and 002 doesn't compile on > windows. Have you checked test results from cirrusci on your private > github account ? FYI, I have re-added an entry to the CF app to get some automated coverage: https://commitfest.postgresql.org/41/3571/ On MinGW, a complain about the open() callback, which I guess ought to be avoided with a rename: [00:16:37.254] compress_gzip.c:356:38: error: macro "open" passed 4 arguments, but takes just 3 [00:16:37.254] 356 | ret = CFH->open(fname, -1, mode, CFH); [00:16:37.254] | ^ [00:16:37.254] In file included from ../../../src/include/c.h:1309, [00:16:37.254] from ../../../src/include/postgres_fe.h:25, [00:16:37.254] from compress_gzip.c:15: On MSVC, some declaration conflicts, for a similar issue: [00:12:31.966] ../src/bin/pg_dump/compress_io.c(193): error C2371: '_read': redefinition; different basic types [00:12:31.966] C:\Program Files (x86)\Windows Kits\10\include\10.0.20348.0\ucrt\corecrt_io.h(252): note: see declarationof '_read' [00:12:31.966] ../src/bin/pg_dump/compress_io.c(210): error C2371: '_write': redefinition; different basic types [00:12:31.966] C:\Program Files (x86)\Windows Kits\10\include\10.0.20348.0\ucrt\corecrt_io.h(294): note: see declarationof '_write' > 002 breaks "pg_dump -Fc -Z2" because (I think) AllocateCompressor() > doesn't store the passed-in compression_spec. Hmm. This looks like a gap in the existing tests that we'd better fix first. This CI is green on Linux. > 003 still uses <lz4.h> and not "lz4.h". This should be <lz4.h>, not "lz4.h". -- Michael
Вложения
------- Original Message ------- On Monday, December 19th, 2022 at 5:06 AM, Michael Paquier <michael@paquier.xyz> wrote: > > > On Sat, Dec 17, 2022 at 05:26:15PM -0600, Justin Pryzby wrote: > Thank you for the comments, please find v18 attached. > > 001: still refers to "gzip", which is correct for -Fp and -Fd but not > > for -Fc, for which it's more correct to say "zlib". > > > Or should we begin by changing all these existing "not built with zlib > support" error strings to the more generic "this build does not > support compression with %s" to reduce the number of messages to > translate? That would bring consistency with the other tools dealing > with compression. This has been the approach from 0002 on-wards. In the attached it is also applied on the remaining location in 0001. > > > That affects the > > name of the function, structures, comments, etc. I'm not sure if it's > > an issue to re-use the basebackup compression routines here. Maybe we > > should accept "-Zn" for zlib output (-Fc), but reject "gzip:9", which > > I'm sure some will find confusing, as it does not output. Maybe 001 > > should be split into a patch to re-use the existing "cfp" interface > > (which is a clear win), and 002 to re-use the basebackup interfaces for > > user input and constants, etc. > > > > 001 still doesn't compile on freebsd, and 002 doesn't compile on > > windows. Have you checked test results from cirrusci on your private > > github account ? There are still known gaps in 0002 and 0003, for example documentation, and I have not been focusing too much on those. You are right, it is helpful and kind to try to reduce the noise. The attached should have hopefully tackled the ci errors. > > FYI, I have re-added an entry to the CF app to get some automated > coverage: > https://commitfest.postgresql.org/41/3571/ Much obliged. Should I change the state to "ready for review" when post a new version or should I leave that to the senior personnel? > > On MinGW, a complain about the open() callback, which I guess ought to > be avoided with a rename: > [00:16:37.254] compress_gzip.c:356:38: error: macro "open" passed 4 arguments, but takes just 3 > [00:16:37.254] 356 | ret = CFH->open(fname, -1, mode, CFH); > > [00:16:37.254] | ^ > [00:16:37.254] In file included from ../../../src/include/c.h:1309, > [00:16:37.254] from ../../../src/include/postgres_fe.h:25, > [00:16:37.254] from compress_gzip.c:15: > > On MSVC, some declaration conflicts, for a similar issue: > [00:12:31.966] ../src/bin/pg_dump/compress_io.c(193): error C2371: '_read': redefinition; different basic types > [00:12:31.966] C:\Program Files (x86)\Windows Kits\10\include\10.0.20348.0\ucrt\corecrt_io.h(252): note: see declarationof '_read' > [00:12:31.966] ../src/bin/pg_dump/compress_io.c(210): error C2371: '_write': redefinition; different basic types > [00:12:31.966] C:\Program Files (x86)\Windows Kits\10\include\10.0.20348.0\ucrt\corecrt_io.h(294): note: see declarationof '_write' > A rename was enough. > > 002 breaks "pg_dump -Fc -Z2" because (I think) AllocateCompressor() > > doesn't store the passed-in compression_spec. > I am afraid I have not been able to reproduce this error. I tried both debian and freebsd after I addressed the compilation warnings. Which error did you get? Is it still present in the attached? > Hmm. This looks like a gap in the existing tests that we'd better fix > first. This CI is green on Linux. As the code stands, the compression level is not stored in the custom format's header as it is no longer relevant information. We can decide to make it relevant for the tests only on the expense of increasing dump size by four bytes. In either case this is not applicable in current head and can wait for 0002's turn. Cheers, //Georgios > > > 003 still uses <lz4.h> and not "lz4.h". > > > This should be <lz4.h>, not "lz4.h". > > -- > Michael
Вложения
On Mon, Dec 19, 2022 at 05:03:21PM +0000, gkokolatos@pm.me wrote: > > > 001 still doesn't compile on freebsd, and 002 doesn't compile on > > > windows. Have you checked test results from cirrusci on your private > > > github account ? > > There are still known gaps in 0002 and 0003, for example documentation, > and I have not been focusing too much on those. You are right, it is helpful > and kind to try to reduce the noise. The attached should have hopefully > tackled the ci errors. Yep. Are you using cirrusci under your github account ? > > FYI, I have re-added an entry to the CF app to get some automated > > coverage: > > https://commitfest.postgresql.org/41/3571/ > > Much obliged. Should I change the state to "ready for review" when post a > new version or should I leave that to the senior personnel? It's better to update it to reflect what you think its current status is. If you think it's ready for review. > > > 002 breaks "pg_dump -Fc -Z2" because (I think) AllocateCompressor() > > > doesn't store the passed-in compression_spec. > > I am afraid I have not been able to reproduce this error. I tried both > debian and freebsd after I addressed the compilation warnings. Which > error did you get? Is it still present in the attached? It's not that there's an error - it's that compression isn't working. $ ./tmp_install/usr/local/pgsql/bin/pg_dump -h /tmp -Z1 -Fp regression |wc -c 659956 $ ./tmp_install/usr/local/pgsql/bin/pg_dump -h /tmp -Z2 -Fp regression |wc -c 637192 $ ./tmp_install/usr/local/pgsql/bin/pg_dump -h /tmp -Z1 -Fc regression |wc -c 1954890 $ ./tmp_install/usr/local/pgsql/bin/pg_dump -h /tmp -Z2 -Fc regression |wc -c 1954890 -- Justin
On Mon, Dec 19, 2022 at 01:06:00PM +0900, Michael Paquier wrote: > On Sat, Dec 17, 2022 at 05:26:15PM -0600, Justin Pryzby wrote: > > 001: still refers to "gzip", which is correct for -Fp and -Fd but not > > for -Fc, for which it's more correct to say "zlib". > > Or should we begin by changing all these existing "not built with zlib > support" error strings to the more generic "this build does not > support compression with %s" to reduce the number of messages to > translate? That would bring consistency with the other tools dealing > with compression. That's fine, but it doesn't touch on the issue I'm talking about, which is that zlib != gzip. BTW I noticed that that also affects the pg_dump file itself; 002 changes the file format to say "gzip", but that's wrong for -Fc, which does not use gzip headers, which could be surprising to someone who specified "gzip". -- Justin
------- Original Message ------- On Monday, December 19th, 2022 at 6:27 PM, Justin Pryzby <pryzby@telsasoft.com> wrote: > > > On Mon, Dec 19, 2022 at 05:03:21PM +0000, gkokolatos@pm.me wrote: > > > > > 001 still doesn't compile on freebsd, and 002 doesn't compile on > > > > windows. Have you checked test results from cirrusci on your private > > > > github account ? > > > > There are still known gaps in 0002 and 0003, for example documentation, > > and I have not been focusing too much on those. You are right, it is helpful > > and kind to try to reduce the noise. The attached should have hopefully > > tackled the ci errors. > > > Yep. Are you using cirrusci under your github account ? Thank you. To be very honest, I am not using github exclusively to post patches. Sometimes I do, sometimes I do not. Is github a requirement? To answer your question, some of my github accounts are integrated with cirrusci, others are not. The current cfbot build is green for what is worth. https://cirrus-ci.com/build/5934319840002048 > > > > FYI, I have re-added an entry to the CF app to get some automated > > > coverage: > > > https://commitfest.postgresql.org/41/3571/ > > > > Much obliged. Should I change the state to "ready for review" when post a > > new version or should I leave that to the senior personnel? > > > It's better to update it to reflect what you think its current status > is. If you think it's ready for review. Thank you. > > > > > 002 breaks "pg_dump -Fc -Z2" because (I think) AllocateCompressor() > > > > doesn't store the passed-in compression_spec. > > > > I am afraid I have not been able to reproduce this error. I tried both > > debian and freebsd after I addressed the compilation warnings. Which > > error did you get? Is it still present in the attached? > > > It's not that there's an error - it's that compression isn't working. > > $ ./tmp_install/usr/local/pgsql/bin/pg_dump -h /tmp -Z1 -Fp regression |wc -c > 659956 > $ ./tmp_install/usr/local/pgsql/bin/pg_dump -h /tmp -Z2 -Fp regression |wc -c > 637192 > > $ ./tmp_install/usr/local/pgsql/bin/pg_dump -h /tmp -Z1 -Fc regression |wc -c > 1954890 > $ ./tmp_install/usr/local/pgsql/bin/pg_dump -h /tmp -Z2 -Fc regression |wc -c > 1954890 > Thank you. Now I understand what you mean. Trying the same on top of v18-0003 on Ubuntu 22.04 yields: $ for compression in none gzip:1 gzip:6 gzip:9; do \ pg_dump --format=custom --compress="$compression" -f regression."$compression".dump -d regression; \ wc -c regression."$compression".dump; \ done; 14963753 regression.none.dump 3600183 regression.gzip:1.dump 3223755 regression.gzip:6.dump 3196903 regression.gzip:9.dump and on FreeBSD 13.1 $ for compression in none gzip:1 gzip:6 gzip:9; do \ pg_dump --format=custom --compress="$compression" -f regression."$compression".dump -d regression; \ wc -c regression."$compression".dump; \ done; 14828822 regression.none.dump 3584304 regression.gzip:1.dump 3208548 regression.gzip:6.dump 3182044 regression.gzip:9.dump Although there are some variations between the installations, within the same installation the size of the dump file is shrinking as expected. Investigating a bit further on the issue, you are correct in identifying an issue in v17. Up until v16, the compressor function looked like: +InitCompressorGzip(CompressorState *cs, int compressionLevel) +{ + GzipCompressorState *gzipcs; + + cs->readData = ReadDataFromArchiveGzip; + cs->writeData = WriteDataToArchiveGzip; + cs->end = EndCompressorGzip; + + gzipcs = (GzipCompressorState *) pg_malloc0(sizeof(GzipCompressorState)); + gzipcs->compressionLevel = compressionLevel; V17 considered that more options could become available in the future and changed the signature of the relevant Init functions to: +InitCompressorGzip(CompressorState *cs, const pg_compress_specification compression_spec) +{ + GzipCompressorState *gzipcs; + + cs->readData = ReadDataFromArchiveGzip; + cs->writeData = WriteDataToArchiveGzip; + cs->end = EndCompressorGzip; + + gzipcs = (GzipCompressorState *) pg_malloc0(sizeof(GzipCompressorState)); + V18 reinstated the assignment in similar fashion to InitCompressorNone and InitCompressorLz4: +void +InitCompressorGzip(CompressorState *cs, const pg_compress_specification compression_spec) +{ + GzipCompressorState *gzipcs; + + cs->readData = ReadDataFromArchiveGzip; + cs->writeData = WriteDataToArchiveGzip; + cs->end = EndCompressorGzip; + + cs->compression_spec = compression_spec; + + gzipcs = (GzipCompressorState *) pg_malloc0(sizeof(GzipCompressorState)); A test case can be added which performs a check similar to the loop above. Create a custom dump with the least and most compression for each method. Then verify that the output sizes differ as expected. This addition could become 0001 in the current series. Thoughts? Cheers, //Georgios > -- > Justin
On Tue, Dec 20, 2022 at 11:19:15AM +0000, gkokolatos@pm.me wrote: > ------- Original Message ------- > On Monday, December 19th, 2022 at 6:27 PM, Justin Pryzby <pryzby@telsasoft.com> wrote: > > On Mon, Dec 19, 2022 at 05:03:21PM +0000, gkokolatos@pm.me wrote: > > > > > > > 001 still doesn't compile on freebsd, and 002 doesn't compile on > > > > > windows. Have you checked test results from cirrusci on your private > > > > > github account ? > > > > > > There are still known gaps in 0002 and 0003, for example documentation, > > > and I have not been focusing too much on those. You are right, it is helpful > > > and kind to try to reduce the noise. The attached should have hopefully > > > tackled the ci errors. > > > > > > Yep. Are you using cirrusci under your github account ? > > Thank you. To be very honest, I am not using github exclusively to post patches. > Sometimes I do, sometimes I do not. Is github a requirement? Github isn't a requirement for postgres (but cirrusci only supports github). I wasn't not trying to say that it's required, only trying to make sure that you (and others) know that it's available, since our cirrus.yml is relatively new. > > > > > 002 breaks "pg_dump -Fc -Z2" because (I think) AllocateCompressor() > > > > > doesn't store the passed-in compression_spec. > > > > > > I am afraid I have not been able to reproduce this error. I tried both > > > debian and freebsd after I addressed the compilation warnings. Which > > > error did you get? Is it still present in the attached? > > > > > > It's not that there's an error - it's that compression isn't working. > > > > $ ./tmp_install/usr/local/pgsql/bin/pg_dump -h /tmp -Z1 -Fp regression |wc -c > > 659956 > > $ ./tmp_install/usr/local/pgsql/bin/pg_dump -h /tmp -Z2 -Fp regression |wc -c > > 637192 > > > > $ ./tmp_install/usr/local/pgsql/bin/pg_dump -h /tmp -Z1 -Fc regression |wc -c > > 1954890 > > $ ./tmp_install/usr/local/pgsql/bin/pg_dump -h /tmp -Z2 -Fc regression |wc -c > > 1954890 > > > > Thank you. Now I understand what you mean. Trying the same on top of v18-0003 > on Ubuntu 22.04 yields: You're right; this seems to be fixed in v18. Thanks. It looks like I'd forgotten to run "meson test tmp_install", so had retested v17... -- Justin
------- Original Message ------- On Tuesday, December 20th, 2022 at 4:26 PM, Justin Pryzby <pryzby@telsasoft.com> wrote: > > > On Tue, Dec 20, 2022 at 11:19:15AM +0000, gkokolatos@pm.me wrote: > > > ------- Original Message ------- > > On Monday, December 19th, 2022 at 6:27 PM, Justin Pryzby pryzby@telsasoft.com wrote: > > > > > On Mon, Dec 19, 2022 at 05:03:21PM +0000, gkokolatos@pm.me wrote: > > > > > > > > > 001 still doesn't compile on freebsd, and 002 doesn't compile on > > > > > > windows. Have you checked test results from cirrusci on your private > > > > > > github account ? > > > > > > > > There are still known gaps in 0002 and 0003, for example documentation, > > > > and I have not been focusing too much on those. You are right, it is helpful > > > > and kind to try to reduce the noise. The attached should have hopefully > > > > tackled the ci errors. > > > > > > Yep. Are you using cirrusci under your github account ? > > > > Thank you. To be very honest, I am not using github exclusively to post patches. > > Sometimes I do, sometimes I do not. Is github a requirement? > > > Github isn't a requirement for postgres (but cirrusci only supports > github). I wasn't not trying to say that it's required, only trying to > make sure that you (and others) know that it's available, since our > cirrus.yml is relatively new. Got it. Thank you very much for spreading the word. It is a useful feature which should be known. > > > > > > > 002 breaks "pg_dump -Fc -Z2" because (I think) AllocateCompressor() > > > > > > doesn't store the passed-in compression_spec. > > > > > > > > I am afraid I have not been able to reproduce this error. I tried both > > > > debian and freebsd after I addressed the compilation warnings. Which > > > > error did you get? Is it still present in the attached? > > > > > > It's not that there's an error - it's that compression isn't working. > > > > > > $ ./tmp_install/usr/local/pgsql/bin/pg_dump -h /tmp -Z1 -Fp regression |wc -c > > > 659956 > > > $ ./tmp_install/usr/local/pgsql/bin/pg_dump -h /tmp -Z2 -Fp regression |wc -c > > > 637192 > > > > > > $ ./tmp_install/usr/local/pgsql/bin/pg_dump -h /tmp -Z1 -Fc regression |wc -c > > > 1954890 > > > $ ./tmp_install/usr/local/pgsql/bin/pg_dump -h /tmp -Z2 -Fc regression |wc -c > > > 1954890 > > > > Thank you. Now I understand what you mean. Trying the same on top of v18-0003 > > on Ubuntu 22.04 yields: > > > You're right; this seems to be fixed in v18. Thanks. Great. Still there was a bug in v17 which you discovered. Thank you for the review effort. Please find in the attached v19 an extra check right before calling deflateInit(). This check will verify that only compressed output will be generated for this method. Also v19 is rebased on top f450695e889 and applies cleanly. Cheers. //Georgios > -- > Justin
Вложения
There's a couple of lz4 bits which shouldn't be present in 002: file extension and comments.
On Thu, Dec 22, 2022 at 11:08:59AM -0600, Justin Pryzby wrote: > There's a couple of lz4 bits which shouldn't be present in 002: file > extension and comments. There were "LZ4" comments and file extension stuff in the preparatory commit. But now it seems like you *removed* them in the LZ4 commit (where it actually belongs) rather than *moving* it from the prior/parent commit *to* the lz4 commit. I recommend to run something like "git diff @{1}" whenever doing this kind of patch surgery. + if (AH->compression_spec.algorithm != PG_COMPRESSION_NONE && + AH->compression_spec.algorithm == PG_COMPRESSION_GZIP && This looks wrong/redundant. The gzip part should be removed, right ? Maybe other places that check if (compression==PG_COMPRESSION_GZIP) should maybe change to say compression!=NONE? _PrepParallelRestore() references ".gz", so I think it needs to be retrofitted to handle .lz4. Ideally, that's built into a struct or list of file extensions to try. Maybe compression.h should have a function to return the file extension of a given algorithm. I'm planning to send a patch for zstd, and hoping its changes will be minimized by these preparatory commits. + errno = errno ? : ENOSPC; "?:" is a GNU extension (not the ternary operator, but the ternary operator with only 2 args). It's not in use anywhere else in postgres. You could instead write it with 3 "errno"s or as "if (errno==0): errno=ENOSPC" You wrote "eol_flag == false" and "eol_flag == 0" and true. But it's cleaner to test it as a boolean: if (eol_flag) / if (!eol_flag). Both LZ4File_init() and its callers check "inited". Better to do it in one place than 3. It's a static function, so I think there's no performance concern. Gzip_close() still has a useless save_errno (or rebase issue?). I think it's confusing to have two functions, one named InitCompressLZ4() and InitCompressorLZ4(). pg_compress_specification is being passed by value, but I think it should be passed as a pointer, as is done everywhere else. pg_compress_algorithm is being writen directly into the pg_dump header. Currently, I think that's not an externally-visible value (it could be renumbered, theoretically even in a minor release). Maybe there should be a "private" enum for encoding the pg_dump header, similar to WAL_COMPRESSION_LZ4 vs BKPIMAGE_COMPRESS_LZ4 ? Or else a comment there should warn that the values are encoded in pg_dump, and must never be changed. + Verify that data files where compressed typo: s/where/were/ Also: s/occurance/occurrence/ s/begining/beginning/ s/Verfiy/Verify/ s/nessary/necessary/ BTW I noticed that cfdopen() was accidentally committed to compress_io.h in master without being defined anywhere. -- Justin
On Wed, 21 Dec 2022 at 15:40, <gkokolatos@pm.me> wrote: > > > > > > > ------- Original Message ------- > On Tuesday, December 20th, 2022 at 4:26 PM, Justin Pryzby <pryzby@telsasoft.com> wrote: > > > > > > > > On Tue, Dec 20, 2022 at 11:19:15AM +0000, gkokolatos@pm.me wrote: > > > > > ------- Original Message ------- > > > On Monday, December 19th, 2022 at 6:27 PM, Justin Pryzby pryzby@telsasoft.com wrote: > > > > > > > On Mon, Dec 19, 2022 at 05:03:21PM +0000, gkokolatos@pm.me wrote: > > > > > > > > > > > 001 still doesn't compile on freebsd, and 002 doesn't compile on > > > > > > > windows. Have you checked test results from cirrusci on your private > > > > > > > github account ? > > > > > > > > > > There are still known gaps in 0002 and 0003, for example documentation, > > > > > and I have not been focusing too much on those. You are right, it is helpful > > > > > and kind to try to reduce the noise. The attached should have hopefully > > > > > tackled the ci errors. > > > > > > > > Yep. Are you using cirrusci under your github account ? > > > > > > Thank you. To be very honest, I am not using github exclusively to post patches. > > > Sometimes I do, sometimes I do not. Is github a requirement? > > > > > > Github isn't a requirement for postgres (but cirrusci only supports > > github). I wasn't not trying to say that it's required, only trying to > > make sure that you (and others) know that it's available, since our > > cirrus.yml is relatively new. > > Got it. Thank you very much for spreading the word. It is a useful feature which > should be known. > > > > > > > > > > 002 breaks "pg_dump -Fc -Z2" because (I think) AllocateCompressor() > > > > > > > doesn't store the passed-in compression_spec. > > > > > > > > > > I am afraid I have not been able to reproduce this error. I tried both > > > > > debian and freebsd after I addressed the compilation warnings. Which > > > > > error did you get? Is it still present in the attached? > > > > > > > > It's not that there's an error - it's that compression isn't working. > > > > > > > > $ ./tmp_install/usr/local/pgsql/bin/pg_dump -h /tmp -Z1 -Fp regression |wc -c > > > > 659956 > > > > $ ./tmp_install/usr/local/pgsql/bin/pg_dump -h /tmp -Z2 -Fp regression |wc -c > > > > 637192 > > > > > > > > $ ./tmp_install/usr/local/pgsql/bin/pg_dump -h /tmp -Z1 -Fc regression |wc -c > > > > 1954890 > > > > $ ./tmp_install/usr/local/pgsql/bin/pg_dump -h /tmp -Z2 -Fc regression |wc -c > > > > 1954890 > > > > > > Thank you. Now I understand what you mean. Trying the same on top of v18-0003 > > > on Ubuntu 22.04 yields: > > > > > > You're right; this seems to be fixed in v18. Thanks. > > Great. Still there was a bug in v17 which you discovered. Thank you for the review > effort. > > Please find in the attached v19 an extra check right before calling deflateInit(). > This check will verify that only compressed output will be generated for this > method. > > Also v19 is rebased on top f450695e889 and applies cleanly. The patch does not apply on top of HEAD as in [1], please post a rebased patch: === Applying patches on top of PostgreSQL commit ID ff23b592ad6621563d3128b26860bcb41daf9542 === === applying patch ./v19-0002-Introduce-Compressor-API-in-pg_dump.patch patching file src/bin/pg_dump/compress_io.h Hunk #1 FAILED at 37. 1 out of 1 hunk FAILED -- saving rejects to file src/bin/pg_dump/compress_io.h.rej [1] - http://cfbot.cputube.org/patch_41_3571.log Regards, Vignesh
On Sun, Jan 08, 2023 at 01:45:25PM -0600, Justin Pryzby wrote: > On Thu, Dec 22, 2022 at 11:08:59AM -0600, Justin Pryzby wrote: > > There's a couple of lz4 bits which shouldn't be present in 002: file > > extension and comments. > BTW I noticed that cfdopen() was accidentally committed to compress_io.h > in master without being defined anywhere. This was resolved in 69fb29d1a (so now needs to be re-added for this patch series). > pg_compress_specification is being passed by value, but I think it > should be passed as a pointer, as is done everywhere else. ISTM that was an issue with 5e73a6048, affecting a few public and private functions. I wrote a pre-preparatory patch which changes to pass by reference. And addressed a handful of other issues I reported as separate fixup commits. And changed to use LZ4 by default for CI. I also rebased my 2 year old patch to support zstd in pg_dump. I hope it can finally added for v16. I'll send it for the next CF if these patches progress. One more thing: some comments still refer to the cfopen API, which this patch removes. > There were "LZ4" comments and file extension stuff in the preparatory > commit. But now it seems like you *removed* them in the LZ4 commit > (where it actually belongs) rather than *moving* it from the > prior/parent commit *to* the lz4 commit. I recommend to run something > like "git diff @{1}" whenever doing this kind of patch surgery. TODO > Maybe other places that check if (compression==PG_COMPRESSION_GZIP) > should maybe change to say compression!=NONE? > > _PrepParallelRestore() references ".gz", so I think it needs to be > retrofitted to handle .lz4. Ideally, that's built into a struct or list > of file extensions to try. Maybe compression.h should have a function > to return the file extension of a given algorithm. I'm planning to send > a patch for zstd, and hoping its changes will be minimized by these > preparatory commits. TODO > I think it's confusing to have two functions, one named > InitCompressLZ4() and InitCompressorLZ4(). TODO > pg_compress_algorithm is being writen directly into the pg_dump header. > Currently, I think that's not an externally-visible value (it could be > renumbered, theoretically even in a minor release). Maybe there should > be a "private" enum for encoding the pg_dump header, similar to > WAL_COMPRESSION_LZ4 vs BKPIMAGE_COMPRESS_LZ4 ? Or else a comment there > should warn that the values are encoded in pg_dump, and must never be > changed. Michael, WDYT ? -- Justin
Вложения
- 0001-pg_dump-pass-pg_compress_specification-as-a-pointer.patch
- 0002-Prepare-pg_dump-internals-for-additional-compression.patch
- 0003-Introduce-Compressor-API-in-pg_dump.patch
- 0004-f.patch
- 0005-Add-LZ4-compression-in-pg_-dump-restore.patch
- 0006-f.patch
- 0007-TMP-pg_dump-use-lz4-by-default-for-CI-only.patch
On Sat, Jan 14, 2023 at 03:43:08PM -0600, Justin Pryzby wrote: > On Sun, Jan 08, 2023 at 01:45:25PM -0600, Justin Pryzby wrote: > > pg_compress_specification is being passed by value, but I think it > > should be passed as a pointer, as is done everywhere else. > > ISTM that was an issue with 5e73a6048, affecting a few public and > private functions. I wrote a pre-preparatory patch which changes to > pass by reference. I updated 001 to change SetOutput() to pass by reference, too (before, that ended up in the 002 patch). I can't see any issue in 002 other than the == GZIP change (the fix for which I'd previously included in a later patch). > One more thing: some comments still refer to the cfopen API, which this > patch removes. > > > There were "LZ4" comments and file extension stuff in the preparatory > > commit. But now it seems like you *removed* them in the LZ4 commit > > (where it actually belongs) rather than *moving* it from the > > prior/parent commit *to* the lz4 commit. I recommend to run something > > like "git diff @{1}" whenever doing this kind of patch surgery. > > TODO I addressed that in the fixup commits 005 and 007. -- Justin
Вложения
- 0001-pg_dump-pass-pg_compress_specification-as-a-pointer.patch
- 0002-Prepare-pg_dump-internals-for-additional-compression.patch
- 0003-f.patch
- 0004-Introduce-Compressor-API-in-pg_dump.patch
- 0005-f.patch
- 0006-Add-LZ4-compression-in-pg_-dump-restore.patch
- 0007-f.patch
- 0008-TMP-pg_dump-use-lz4-by-default-for-CI-only.patch
On Sat, Jan 14, 2023 at 03:43:09PM -0600, Justin Pryzby wrote: > On Sun, Jan 08, 2023 at 01:45:25PM -0600, Justin Pryzby wrote: >> pg_compress_specification is being passed by value, but I think it >> should be passed as a pointer, as is done everywhere else. > > ISTM that was an issue with 5e73a6048, affecting a few public and > private functions. I wrote a pre-preparatory patch which changes to > pass by reference. The functions changed by 0001 are cfopen[_write](), AllocateCompressor() and ReadDataFromArchive(). Why is it a good idea to change these interfaces which basically exist to handle inputs? Is there some benefit in changing compression_spec within the internals of these routines before going back one layer down to their callers? Changing the compression_spec on-the-fly in these internal paths could be risky, actually, no? > And addressed a handful of other issues I reported as separate fixup > commits. And changed to use LZ4 by default for CI. Are your slight changes shaped as of 0003-f.patch, 0005-f.patch and 0007-f.patch on top of the original patches sent by Georgios? > I also rebased my 2 year old patch to support zstd in pg_dump. I hope > it can finally added for v16. I'll send it for the next CF if these > patches progress. Good idea to see if what you have done for zstd fits with what's presented here. >> pg_compress_algorithm is being writen directly into the pg_dump header. Do you mean that this is what happens once the patch series 0001~0008 sent upthread is applied on HEAD? >> Currently, I think that's not an externally-visible value (it could be >> renumbered, theoretically even in a minor release). Maybe there should >> be a "private" enum for encoding the pg_dump header, similar to >> WAL_COMPRESSION_LZ4 vs BKPIMAGE_COMPRESS_LZ4 ? Or else a comment there >> should warn that the values are encoded in pg_dump, and must never be >> changed. > > Michael, WDYT ? Changing the order of the members in an enum would cause an ABI breakage, so that would not happen, and we tend to be very careful about that. Appending new members would be fine, though. FWIW, I'd rather avoid adding more enums that would just be exact maps to pg_compress_algorithm. - /* - * For now the compression type is implied by the level. This will need - * to change once support for more compression algorithms is added, - * requiring a format bump. - */ - WriteInt(AH, AH->compression_spec.level); + AH->WriteBytePtr(AH, AH->compression_spec.algorithm); I may be missing something here, but it seems to me that you ought to store as well the level in the dump header, or it would not be possible to report in the dump's description what was used? Hence, K_VERS_1_15 should imply that we have both the method compression and the compression level. -- Michael
Вложения
On Mon, Jan 16, 2023 at 10:28:50AM +0900, Michael Paquier wrote: > On Sat, Jan 14, 2023 at 03:43:09PM -0600, Justin Pryzby wrote: > > On Sun, Jan 08, 2023 at 01:45:25PM -0600, Justin Pryzby wrote: > >> pg_compress_specification is being passed by value, but I think it > >> should be passed as a pointer, as is done everywhere else. > > > > ISTM that was an issue with 5e73a6048, affecting a few public and > > private functions. I wrote a pre-preparatory patch which changes to > > pass by reference. > > The functions changed by 0001 are cfopen[_write](), > AllocateCompressor() and ReadDataFromArchive(). Why is it a good idea > to change these interfaces which basically exist to handle inputs? I changed to pass pg_compress_specification as a pointer, since that's the usual convention for structs, as followed by the existing uses of pg_compress_specification. > Is there some benefit in changing compression_spec within the > internals of these routines before going back one layer down to their > callers? Changing the compression_spec on-the-fly in these internal > paths could be risky, actually, no? I think what you're saying is that if the spec is passed as a pointer, then the called functions shouldn't set spec->algorithm=something. I agree that if they need to do that, they should use a local variable. Which looks to be true for the functions that were changed in 001. > > And addressed a handful of other issues I reported as separate fixup > > commits. And changed to use LZ4 by default for CI. > > Are your slight changes shaped as of 0003-f.patch, 0005-f.patch and > 0007-f.patch on top of the original patches sent by Georgios? Yes, the original patches, rebased as needed on top of HEAD and 001... > >> pg_compress_algorithm is being writen directly into the pg_dump header. > > Do you mean that this is what happens once the patch series 0001~0008 > sent upthread is applied on HEAD? Yes > - /* > - * For now the compression type is implied by the level. This will need > - * to change once support for more compression algorithms is added, > - * requiring a format bump. > - */ > - WriteInt(AH, AH->compression_spec.level); > + AH->WriteBytePtr(AH, AH->compression_spec.algorithm); > > I may be missing something here, but it seems to me that you ought to > store as well the level in the dump header, or it would not be > possible to report in the dump's description what was used? Hence, > K_VERS_1_15 should imply that we have both the method compression and > the compression level. Maybe. But the "level" isn't needed for decompression for any case I'm aware of. Also, dumps with the default compression level currently say: "Compression: -1", which does't seem valuable. -- Justin
On Mon, Jan 16, 2023 at 10:28:50AM +0900, Michael Paquier wrote:
> On Sat, Jan 14, 2023 at 03:43:09PM -0600, Justin Pryzby wrote:
> > On Sun, Jan 08, 2023 at 01:45:25PM -0600, Justin Pryzby wrote:
> >> pg_compress_specification is being passed by value, but I think it
> >> should be passed as a pointer, as is done everywhere else.
> >
> > ISTM that was an issue with 5e73a6048, affecting a few public and
> > private functions. I wrote a pre-preparatory patch which changes to
> > pass by reference.
>
> The functions changed by 0001 are cfopen[_write](),
> AllocateCompressor() and ReadDataFromArchive(). Why is it a good idea
> to change these interfaces which basically exist to handle inputs?
I changed to pass pg_compress_specification as a pointer, since that's
the usual convention for structs, as followed by the existing uses of
pg_compress_specification.
> Is there some benefit in changing compression_spec within the
> internals of these routines before going back one layer down to their
> callers? Changing the compression_spec on-the-fly in these internal
> paths could be risky, actually, no?
I think what you're saying is that if the spec is passed as a pointer,
then the called functions shouldn't set spec->algorithm=something.
I agree that if they need to do that, they should use a local variable.
Which looks to be true for the functions that were changed in 001.
> > And addressed a handful of other issues I reported as separate fixup
> > commits. And changed to use LZ4 by default for CI.
>
> Are your slight changes shaped as of 0003-f.patch, 0005-f.patch and
> 0007-f.patch on top of the original patches sent by Georgios?
Yes, the original patches, rebased as needed on top of HEAD and 001...
> >> pg_compress_algorithm is being writen directly into the pg_dump header.
>
> Do you mean that this is what happens once the patch series 0001~0008
> sent upthread is applied on HEAD?
Yes
> - /*
> - * For now the compression type is implied by the level. This will need
> - * to change once support for more compression algorithms is added,
> - * requiring a format bump.
> - */
> - WriteInt(AH, AH->compression_spec.level);
> + AH->WriteBytePtr(AH, AH->compression_spec.algorithm);
>
> I may be missing something here, but it seems to me that you ought to
> store as well the level in the dump header, or it would not be
> possible to report in the dump's description what was used? Hence,
> K_VERS_1_15 should imply that we have both the method compression and
> the compression level.
Maybe. But the "level" isn't needed for decompression for any case I'm
aware of.
Also, dumps with the default compression level currently say:
"Compression: -1", which does't seem valuable.
--
Justin
On Mon, Jan 16, 2023 at 02:27:56AM +0000, gkokolatos@pm.me wrote: > Oh, I didn’t realize you took over Justin? Why? After almost a year of work? > > This is rather disheartening. I believe you've misunderstood my intent here. I sent rebased versions of your patches with fixup commits implementing fixes that I'd previously sent. I don't think that's unusual. I hope your patches will be included in v16, and I hope to facilitate that. I don't mean any offense. Actually, the fixups are provided as separate patches so you can adopt the changes easily into your branch. -- Justin
On Sun, Jan 15, 2023 at 07:56:25PM -0600, Justin Pryzby wrote: > On Mon, Jan 16, 2023 at 10:28:50AM +0900, Michael Paquier wrote: >> The functions changed by 0001 are cfopen[_write](), >> AllocateCompressor() and ReadDataFromArchive(). Why is it a good idea >> to change these interfaces which basically exist to handle inputs? > > I changed to pass pg_compress_specification as a pointer, since that's > the usual convention for structs, as followed by the existing uses of > pg_compress_specification. Okay, but what do we gain here? It seems to me that this introduces the risk that a careless change in one of the internal routines if they change slight;ly compress_spec, hence impacting any of their callers? Or is that fixing an actual bug (except if I am missing your point, that does not seem to be the case)? >> Is there some benefit in changing compression_spec within the >> internals of these routines before going back one layer down to their >> callers? Changing the compression_spec on-the-fly in these internal >> paths could be risky, actually, no? > > I think what you're saying is that if the spec is passed as a pointer, > then the called functions shouldn't set spec->algorithm=something. Yes. HEAD makes sure of that, 0001 would not prevent that. So I am a bit confused in seeing how this is a benefit. -- Michael
Вложения
Hi, I admit I am completely at lost as to what is expected from me anymore. I had posted v19-0001 for a committer's consideration and v19-000{2,3} for completeness. Please find a rebased v20 attached. Also please let me know if I should silently step away from it and let other people lead it. I would be glad to comply either way. Cheers, //Georgios ------- Original Message ------- On Monday, January 16th, 2023 at 3:54 AM, Michael Paquier <michael@paquier.xyz> wrote: > > > On Sun, Jan 15, 2023 at 07:56:25PM -0600, Justin Pryzby wrote: > > > On Mon, Jan 16, 2023 at 10:28:50AM +0900, Michael Paquier wrote: > > > > > The functions changed by 0001 are cfopen_write, > > > AllocateCompressor() and ReadDataFromArchive(). Why is it a good idea > > > to change these interfaces which basically exist to handle inputs? > > > > I changed to pass pg_compress_specification as a pointer, since that's > > the usual convention for structs, as followed by the existing uses of > > pg_compress_specification. > > > Okay, but what do we gain here? It seems to me that this introduces > the risk that a careless change in one of the internal routines if > they change slight;ly compress_spec, hence impacting any of their > callers? Or is that fixing an actual bug (except if I am missing your > point, that does not seem to be the case)? > > > > Is there some benefit in changing compression_spec within the > > > internals of these routines before going back one layer down to their > > > callers? Changing the compression_spec on-the-fly in these internal > > > paths could be risky, actually, no? > > > > I think what you're saying is that if the spec is passed as a pointer, > > then the called functions shouldn't set spec->algorithm=something. > > > Yes. HEAD makes sure of that, 0001 would not prevent that. So I am a > bit confused in seeing how this is a benefit. > -- > Michael
Вложения
Hi, On 1/16/23 16:14, gkokolatos@pm.me wrote: > Hi, > > I admit I am completely at lost as to what is expected from me anymore. > :-( I understand it's frustrating not to know why a patch is not moving forward. Particularly when is seems fairly straightforward ... Let me briefly explain my personal (and admittedly very subjective) view on picking what patches to review/commit. I'm sure other committers have other criteria, but maybe this will help. There are always more patches than I can review/commit, so I have to prioritize, and pick which patches to look at. For me, it's mostly about cost/benefit of the patch. The cost is e.g. the amount of time I need to spend to review/commit the stuff, maybe read the thread, etc. Benefits is mainly the new features/improvements. It's oversimplified, we could talk about various bits that contribute to the costs and benefits, but this is what it boils down. There's always the aspect of time - patches A and B have roughly the same benefits, but with A we get it "immediately" while B requires additional parts that we don't have ready yet (and if they don't make it we get no benefit), I'll probably pick A. Unfortunately, this plays against this patch - I'm certainly in favor of adding lz4 (and other compression algos) into pg_dump, but if I commit 0001 we get little benefit, and the other parts actually adding lz4/zstd are treated as "WIP / for completeness" so it's unclear when we'd get to commit them. So if I could recommend one thing, it'd be to get at least one of those WIP patches into a shape that's likely committable right after 0001. > I had posted v19-0001 for a committer's consideration and v19-000{2,3} for completeness. > Please find a rebased v20 attached. > I took a quick look at 0001, so a couple comments (sorry if some of this was already discussed in the thread): 1) I don't think a "refactoring" patch should reference particular compression algorithms (lz4/zstd), and in particular I don't think we should have "not yet implemented" messages. We only have a couple other places doing that, when we didn't have a better choice. But here we can simply reject the algorithm when parsing the options, we don't need to do that in a dozen other places. 2) I wouldn't reorder the cases in WriteDataToArchive, i.e. I'd keep "none" at the end. It might make backpatches harder. 3) While building, I get bunch of warnings about missing cfdopen() prototype and pg_backup_archiver.c not knowing about cfdopen() and adding an implicit prototype (so I doubt it actually works). 4) "cfp" struct no longer wraps gzFile, but the comment was not updated. FWIW I'm not sure switching to "void *" is an improvement, maybe it'd be better to have a "union" of correct types? 5) cfopen/cfdopen are missing comments. cfopen_internal has an updated comment, but that's a static function while cfopen/cfdopen are the actual API. > Also please let me know if I should silently step away from it and let other people lead > it. I would be glad to comply either way. > Please don't. I promise to take a look at this patch again. Thanks for doing all the work. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
------- Original Message ------- On Wednesday, January 18th, 2023 at 3:00 PM, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > > Hi, > > On 1/16/23 16:14, gkokolatos@pm.me wrote: > > > Hi, > > > > I admit I am completely at lost as to what is expected from me anymore. > <snip> > > Unfortunately, this plays against this patch - I'm certainly in favor of > adding lz4 (and other compression algos) into pg_dump, but if I commit > 0001 we get little benefit, and the other parts actually adding lz4/zstd > are treated as "WIP / for completeness" so it's unclear when we'd get to > commit them. Thank you for your kindness and for taking the time to explain. > So if I could recommend one thing, it'd be to get at least one of those > WIP patches into a shape that's likely committable right after 0001. This was clearly my fault. I misunderstood a suggestion upthread to focus on the first patch of the series and ignore documentation and comments on the rest. Please find v21 to contain 0002 and 0003 in a state which I no longer consider as WIP but worthy of proper consideration. Some guidance on where is best to add documentation in 0002 for the function pointers in CompressFileHandle will be welcomed. > > > I had posted v19-0001 for a committer's consideration and v19-000{2,3} for completeness. > > Please find a rebased v20 attached. > > > I took a quick look at 0001, so a couple comments (sorry if some of this > was already discussed in the thread): Much appreciated! > > 1) I don't think a "refactoring" patch should reference particular > compression algorithms (lz4/zstd), and in particular I don't think we > should have "not yet implemented" messages. We only have a couple other > places doing that, when we didn't have a better choice. But here we can > simply reject the algorithm when parsing the options, we don't need to > do that in a dozen other places. I have now removed lz4/zstd from where they were present with the exception of pg_dump.c which is responsible for parsing. > 2) I wouldn't reorder the cases in WriteDataToArchive, i.e. I'd keep > "none" at the end. It might make backpatches harder. Agreed. However a 'default' is needed in order to avoid compilation warnings. Also note that 0002 completely does away with cases within WriteDataToArchive. > 3) While building, I get bunch of warnings about missing cfdopen() > prototype and pg_backup_archiver.c not knowing about cfdopen() and > adding an implicit prototype (so I doubt it actually works). Fixed. cfdopen() got prematurely introduced in 5e73a6048 and then got removed in 69fb29d1af. v20 failed to properly take 69fb29d1af in consideration. Note that cfdopen is removed in 0002 which explains why cfbot didn't complain. > 4) "cfp" struct no longer wraps gzFile, but the comment was not updated. > FWIW I'm not sure switching to "void *" is an improvement, maybe it'd be > better to have a "union" of correct types? Please find and updated comment and a union in place of the void *. Also note that 0002 completely does away with cfp in favour of a new struct CompressFileHandle. I maintained the void * there because it is used by private methods of the compressors. 0003 contains such an example with LZ4CompressorState. > 5) cfopen/cfdopen are missing comments. cfopen_internal has an updated > comment, but that's a static function while cfopen/cfdopen are the > actual API. Added comments to cfopen/cfdopen. > > > Also please let me know if I should silently step away from it and let other people lead > > it. I would be glad to comply either way. > > > Please don't. I promise to take a look at this patch again. Thank you very much. > Thanks for doing all the work. Thank you. Cheers, //Georgios > regards > > -- > Tomas Vondra > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company
Вложения
On 1/18/23 20:05, gkokolatos@pm.me wrote: > > > > > > ------- Original Message ------- > On Wednesday, January 18th, 2023 at 3:00 PM, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > >> >> >> Hi, >> >> On 1/16/23 16:14, gkokolatos@pm.me wrote: >> >>> Hi, >>> >>> I admit I am completely at lost as to what is expected from me anymore. >> > <snip> >> >> Unfortunately, this plays against this patch - I'm certainly in favor of >> adding lz4 (and other compression algos) into pg_dump, but if I commit >> 0001 we get little benefit, and the other parts actually adding lz4/zstd >> are treated as "WIP / for completeness" so it's unclear when we'd get to >> commit them. > > Thank you for your kindness and for taking the time to explain. > >> So if I could recommend one thing, it'd be to get at least one of those >> WIP patches into a shape that's likely committable right after 0001. > > This was clearly my fault. I misunderstood a suggestion upthread to focus > on the first patch of the series and ignore documentation and comments on > the rest. > > Please find v21 to contain 0002 and 0003 in a state which I no longer consider > as WIP but worthy of proper consideration. Some guidance on where is best to add > documentation in 0002 for the function pointers in CompressFileHandle will > be welcomed. > This is internal-only API, not meant for use by regular users and/or extension authors, so I don't think we need sgml docs. I'd just add regular code-level documentation to compress_io.h. For inspiration see docs for "struct ReorderBuffer" in reorderbuffer.h, or "struct _archiveHandle" in pg_backup_archiver.h. Or what other kind of documentation you had in mind? >> >>> I had posted v19-0001 for a committer's consideration and v19-000{2,3} for completeness. >>> Please find a rebased v20 attached. >> >> >> I took a quick look at 0001, so a couple comments (sorry if some of this >> was already discussed in the thread): > > Much appreciated! > >> >> 1) I don't think a "refactoring" patch should reference particular >> compression algorithms (lz4/zstd), and in particular I don't think we >> should have "not yet implemented" messages. We only have a couple other >> places doing that, when we didn't have a better choice. But here we can >> simply reject the algorithm when parsing the options, we don't need to >> do that in a dozen other places. > > I have now removed lz4/zstd from where they were present with the exception > of pg_dump.c which is responsible for parsing. > I'm not sure I understand why leave the lz4/zstd in this place? >> 2) I wouldn't reorder the cases in WriteDataToArchive, i.e. I'd keep >> "none" at the end. It might make backpatches harder. > > Agreed. However a 'default' is needed in order to avoid compilation warnings. > Also note that 0002 completely does away with cases within WriteDataToArchive. > OK, although that's also a consequence of using a "switch" instead of plan "if" branches. Furthermore, I'm not sure we really need the pg_fatal() about invalid compression method in these default blocks. I mean, how could we even get to these places when the build does not support the algorithm? All of this (ReadDataFromArchive, WriteDataToArchive, EndCompressor, ...) happens looong after the compressor was initialized and the method checked, no? So maybe either this should simply do Assert(false) or use a different error message. >> 3) While building, I get bunch of warnings about missing cfdopen() >> prototype and pg_backup_archiver.c not knowing about cfdopen() and >> adding an implicit prototype (so I doubt it actually works). > > Fixed. cfdopen() got prematurely introduced in 5e73a6048 and then got removed > in 69fb29d1af. v20 failed to properly take 69fb29d1af in consideration. Note > that cfdopen is removed in 0002 which explains why cfbot didn't complain. > OK. >> 4) "cfp" struct no longer wraps gzFile, but the comment was not updated. >> FWIW I'm not sure switching to "void *" is an improvement, maybe it'd be >> better to have a "union" of correct types? > > Please find and updated comment and a union in place of the void *. Also > note that 0002 completely does away with cfp in favour of a new struct > CompressFileHandle. I maintained the void * there because it is used by > private methods of the compressors. 0003 contains such an example with > LZ4CompressorState. > I wonder if this (and also the previous item) makes sense to keep 0001 and 0002 or to combine them. The "intermediate" state is a bit annoying. >> 5) cfopen/cfdopen are missing comments. cfopen_internal has an updated >> comment, but that's a static function while cfopen/cfdopen are the >> actual API. > > Added comments to cfopen/cfdopen. > OK. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
------- Original Message ------- On Thursday, January 19th, 2023 at 4:45 PM, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > > On 1/18/23 20:05, gkokolatos@pm.me wrote: > > > ------- Original Message ------- > > On Wednesday, January 18th, 2023 at 3:00 PM, Tomas Vondra tomas.vondra@enterprisedb.com wrote: > > > > > Hi, > > > > > > On 1/16/23 16:14, gkokolatos@pm.me wrote: > > > > > > > Hi, > > > > > > > > I admit I am completely at lost as to what is expected from me anymore. > > > > <snip> > > > > > Unfortunately, this plays against this patch - I'm certainly in favor of > > > adding lz4 (and other compression algos) into pg_dump, but if I commit > > > 0001 we get little benefit, and the other parts actually adding lz4/zstd > > > are treated as "WIP / for completeness" so it's unclear when we'd get to > > > commit them. > > > > Thank you for your kindness and for taking the time to explain. > > > > > So if I could recommend one thing, it'd be to get at least one of those > > > WIP patches into a shape that's likely committable right after 0001. > > > > This was clearly my fault. I misunderstood a suggestion upthread to focus > > on the first patch of the series and ignore documentation and comments on > > the rest. > > > > Please find v21 to contain 0002 and 0003 in a state which I no longer consider > > as WIP but worthy of proper consideration. Some guidance on where is best to add > > documentation in 0002 for the function pointers in CompressFileHandle will > > be welcomed. > > > This is internal-only API, not meant for use by regular users and/or > extension authors, so I don't think we need sgml docs. I'd just add > regular code-level documentation to compress_io.h. > > For inspiration see docs for "struct ReorderBuffer" in reorderbuffer.h, > or "struct _archiveHandle" in pg_backup_archiver.h. > > Or what other kind of documentation you had in mind? This is exactly what I was after. I was between compress_io.c and compress_io.h. Thank you. > > > > I had posted v19-0001 for a committer's consideration and v19-000{2,3} for completeness. > > > > Please find a rebased v20 attached. > > > > > > I took a quick look at 0001, so a couple comments (sorry if some of this > > > was already discussed in the thread): > > > > Much appreciated! > > > > > 1) I don't think a "refactoring" patch should reference particular > > > compression algorithms (lz4/zstd), and in particular I don't think we > > > should have "not yet implemented" messages. We only have a couple other > > > places doing that, when we didn't have a better choice. But here we can > > > simply reject the algorithm when parsing the options, we don't need to > > > do that in a dozen other places. > > > > I have now removed lz4/zstd from where they were present with the exception > > of pg_dump.c which is responsible for parsing. > > > I'm not sure I understand why leave the lz4/zstd in this place? You are right, it is not obvious. Those were added in 5e73a60488 which is already committed in master and I didn't want to backtrack. Of course, I am not opposing in doing so if you wish. > > > > 2) I wouldn't reorder the cases in WriteDataToArchive, i.e. I'd keep > > > "none" at the end. It might make backpatches harder. > > > > Agreed. However a 'default' is needed in order to avoid compilation warnings. > > Also note that 0002 completely does away with cases within WriteDataToArchive. > > > OK, although that's also a consequence of using a "switch" instead of > plan "if" branches. > > Furthermore, I'm not sure we really need the pg_fatal() about invalid > compression method in these default blocks. I mean, how could we even > get to these places when the build does not support the algorithm? All > of this (ReadDataFromArchive, WriteDataToArchive, EndCompressor, ...) > happens looong after the compressor was initialized and the method > checked, no? So maybe either this should simply do Assert(false) or use > a different error message. I like Assert(false). > > > 3) While building, I get bunch of warnings about missing cfdopen() > > > prototype and pg_backup_archiver.c not knowing about cfdopen() and > > > adding an implicit prototype (so I doubt it actually works). > > > > Fixed. cfdopen() got prematurely introduced in 5e73a6048 and then got removed > > in 69fb29d1af. v20 failed to properly take 69fb29d1af in consideration. Note > > that cfdopen is removed in 0002 which explains why cfbot didn't complain. > > > OK. > > > > 4) "cfp" struct no longer wraps gzFile, but the comment was not updated. > > > FWIW I'm not sure switching to "void *" is an improvement, maybe it'd be > > > better to have a "union" of correct types? > > > > Please find and updated comment and a union in place of the void *. Also > > note that 0002 completely does away with cfp in favour of a new struct > > CompressFileHandle. I maintained the void * there because it is used by > > private methods of the compressors. 0003 contains such an example with > > LZ4CompressorState. > > > I wonder if this (and also the previous item) makes sense to keep 0001 > and 0002 or to combine them. The "intermediate" state is a bit annoying. Agreed. It was initially submitted as one patch. Then it was requested to be split up in two parts, one to expand the use of the existing API and one to replace with the new interface. Unfortunately the expansion of usage of the existing API requires some tweaking, but that is not a very good reason for the current patch set. I should have done a better job there. Please find v22 attach which combines back 0001 and 0002. It is missing the documentation that was discussed above as I wanted to give a quick feedback. Let me know if you think that the combined version is the one to move forward with. Cheers, //Georgios > > > > 5) cfopen/cfdopen are missing comments. cfopen_internal has an updated > > > comment, but that's a static function while cfopen/cfdopen are the > > > actual API. > > > > Added comments to cfopen/cfdopen. > > > OK. > > > regards > > -- > Tomas Vondra > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company
Вложения
Hi, On 1/19/23 17:42, gkokolatos@pm.me wrote: > > ------- Original Message ------- > On Thursday, January 19th, 2023 at 4:45 PM, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: >> >> On 1/18/23 20:05, gkokolatos@pm.me wrote: >> >>> ------- Original Message ------- >>> On Wednesday, January 18th, 2023 at 3:00 PM, Tomas Vondra tomas.vondra@enterprisedb.com wrote: >> >> I'm not sure I understand why leave the lz4/zstd in this place? > > You are right, it is not obvious. Those were added in 5e73a60488 which is > already committed in master and I didn't want to backtrack. Of course, I am > not opposing in doing so if you wish. > Ah, I didn't realize it was already added by earlier commit. In that case let's not worry about it. >> >>>> 2) I wouldn't reorder the cases in WriteDataToArchive, i.e. I'd keep >>>> "none" at the end. It might make backpatches harder. >>> >>> Agreed. However a 'default' is needed in order to avoid compilation warnings. >>> Also note that 0002 completely does away with cases within WriteDataToArchive. >> >> >> OK, although that's also a consequence of using a "switch" instead of >> plan "if" branches. >> >> Furthermore, I'm not sure we really need the pg_fatal() about invalid >> compression method in these default blocks. I mean, how could we even >> get to these places when the build does not support the algorithm? All >> of this (ReadDataFromArchive, WriteDataToArchive, EndCompressor, ...) >> happens looong after the compressor was initialized and the method >> checked, no? So maybe either this should simply do Assert(false) or use >> a different error message. > > I like Assert(false). > OK, good. Do you agree we should never actually get there, if the earlier checks work correctly? >> >>>> 4) "cfp" struct no longer wraps gzFile, but the comment was not updated. >>>> FWIW I'm not sure switching to "void *" is an improvement, maybe it'd be >>>> better to have a "union" of correct types? >>> >>> Please find and updated comment and a union in place of the void *. Also >>> note that 0002 completely does away with cfp in favour of a new struct >>> CompressFileHandle. I maintained the void * there because it is used by >>> private methods of the compressors. 0003 contains such an example with >>> LZ4CompressorState. >> >> >> I wonder if this (and also the previous item) makes sense to keep 0001 >> and 0002 or to combine them. The "intermediate" state is a bit annoying. > > Agreed. It was initially submitted as one patch. Then it was requested to be > split up in two parts, one to expand the use of the existing API and one to > replace with the new interface. Unfortunately the expansion of usage of the > existing API requires some tweaking, but that is not a very good reason for > the current patch set. I should have done a better job there. > > Please find v22 attach which combines back 0001 and 0002. It is missing the > documentation that was discussed above as I wanted to give a quick feedback. > Let me know if you think that the combined version is the one to move forward > with. > Thanks, I'll take a look. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 1/19/23 18:55, Tomas Vondra wrote: > Hi, > > On 1/19/23 17:42, gkokolatos@pm.me wrote: >> >> ... >> >> Agreed. It was initially submitted as one patch. Then it was requested to be >> split up in two parts, one to expand the use of the existing API and one to >> replace with the new interface. Unfortunately the expansion of usage of the >> existing API requires some tweaking, but that is not a very good reason for >> the current patch set. I should have done a better job there. >> >> Please find v22 attach which combines back 0001 and 0002. It is missing the >> documentation that was discussed above as I wanted to give a quick feedback. >> Let me know if you think that the combined version is the one to move forward >> with. >> > > Thanks, I'll take a look. > After taking a look and thinking about it a bit more, I think we should keep the two parts separate. I think Michael (or whoever proposed) the split was right, it makes the patches easier to grok. Sorry for the noise, hopefully we can just revert to the last version. While reading the thread, I also noticed this: > By the way, I think that this 0002 should drop all the default clauses > in the switches for the compression method so as we'd catch any > missing code paths with compiler warnings if a new compression method > is added in the future. Now I realize why there were "not yet implemented" errors for lz4/zstd in all the switches, and why after removing them you had to add a default branch. We DON'T want a default branch, because the idea is that after adding a new compression algorithm, we get warnings about switches not handling it correctly. So I guess we should walk back this change too :-( It's probably easier to go back to v20 from January 16, and redo the couple remaining things I commented on. FWIW I think this is a hint that adding LZ4/ZSTD options, in 5e73a6048, but without implementation, was not a great idea. It mostly defeats the idea of getting the compiler warnings - all the places already handle PG_COMPRESSION_LZ4/PG_COMPRESSION_ZSTD by throwing a pg_fatal. So you'd have to grep for the options, inspect all the places or something like that anyway. The warnings would only work for entirely new methods. However, I now also realize the compressor API in 0002 replaces all of this with calls to a generic API callback, so trying to improve this was pretty silly from me. Please, fix the couple remaining details in v20, add the docs for the callbacks, and I'll try to polish it and get it committed. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
------- Original Message ------- On Friday, January 20th, 2023 at 12:34 AM, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > > On 1/19/23 18:55, Tomas Vondra wrote: > > > Hi, > > > > On 1/19/23 17:42, gkokolatos@pm.me wrote: > > > > > ... > > > > > > Agreed. It was initially submitted as one patch. Then it was requested to be > > > split up in two parts, one to expand the use of the existing API and one to > > > replace with the new interface. Unfortunately the expansion of usage of the > > > existing API requires some tweaking, but that is not a very good reason for > > > the current patch set. I should have done a better job there. > > > > > > Please find v22 attach which combines back 0001 and 0002. It is missing the > > > documentation that was discussed above as I wanted to give a quick feedback. > > > Let me know if you think that the combined version is the one to move forward > > > with. > > > > Thanks, I'll take a look. > > > After taking a look and thinking about it a bit more, I think we should > keep the two parts separate. I think Michael (or whoever proposed) the > split was right, it makes the patches easier to grok. > Excellent. I will attempt a better split this time round. > > While reading the thread, I also noticed this: > > > By the way, I think that this 0002 should drop all the default clauses > > in the switches for the compression method so as we'd catch any > > missing code paths with compiler warnings if a new compression method > > is added in the future. > > > Now I realize why there were "not yet implemented" errors for lz4/zstd > in all the switches, and why after removing them you had to add a > default branch. > > We DON'T want a default branch, because the idea is that after adding a > new compression algorithm, we get warnings about switches not handling > it correctly. > > So I guess we should walk back this change too :-( It's probably easier > to go back to v20 from January 16, and redo the couple remaining things > I commented on. > Sure. > > FWIW I think this is a hint that adding LZ4/ZSTD options, in 5e73a6048, > but without implementation, was not a great idea. It mostly defeats the > idea of getting the compiler warnings - all the places already handle > PG_COMPRESSION_LZ4/PG_COMPRESSION_ZSTD by throwing a pg_fatal. So you'd > have to grep for the options, inspect all the places or something like > that anyway. The warnings would only work for entirely new methods. > > However, I now also realize the compressor API in 0002 replaces all of > this with calls to a generic API callback, so trying to improve this was > pretty silly from me. I can try to do a better job at splitting things up. > > Please, fix the couple remaining details in v20, add the docs for the > callbacks, and I'll try to polish it and get it committed. Excellent. Allow me an attempt to polish and expect a new version soon. Cheers, //Georgios > > > regards > > -- > Tomas Vondra > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company
------- Original Message ------- On Friday, January 20th, 2023 at 12:34 AM, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > > On 1/19/23 18:55, Tomas Vondra wrote: > > > Hi, > > > > On 1/19/23 17:42, gkokolatos@pm.me wrote: > > > > > ... > > > > > > Agreed. It was initially submitted as one patch. Then it was requested to be > > > split up in two parts, one to expand the use of the existing API and one to > > > replace with the new interface. Unfortunately the expansion of usage of the > > > existing API requires some tweaking, but that is not a very good reason for > > > the current patch set. I should have done a better job there. > > > > > > Please find v22 attach which combines back 0001 and 0002. It is missing the > > > documentation that was discussed above as I wanted to give a quick feedback. > > > Let me know if you think that the combined version is the one to move forward > > > with. > > > > Thanks, I'll take a look. > > > After taking a look and thinking about it a bit more, I think we should > keep the two parts separate. I think Michael (or whoever proposed) the > split was right, it makes the patches easier to grok. Please find attached v23 which reintroduces the split. 0001 is reworked to have a reduced footprint than before. Also in an attempt to facilitate the readability, 0002 splits the API's and the uncompressed implementation in separate files. > > While reading the thread, I also noticed this: > > > By the way, I think that this 0002 should drop all the default clauses > > in the switches for the compression method so as we'd catch any > > missing code paths with compiler warnings if a new compression method > > is added in the future. > > > Now I realize why there were "not yet implemented" errors for lz4/zstd > in all the switches, and why after removing them you had to add a > default branch. > > We DON'T want a default branch, because the idea is that after adding a > new compression algorithm, we get warnings about switches not handling > it correctly. > > So I guess we should walk back this change too :-( It's probably easier > to go back to v20 from January 16, and redo the couple remaining things > I commented on. No problem. > FWIW I think this is a hint that adding LZ4/ZSTD options, in 5e73a6048, > but without implementation, was not a great idea. It mostly defeats the > idea of getting the compiler warnings - all the places already handle > PG_COMPRESSION_LZ4/PG_COMPRESSION_ZSTD by throwing a pg_fatal. So you'd > have to grep for the options, inspect all the places or something like > that anyway. The warnings would only work for entirely new methods. > > However, I now also realize the compressor API in 0002 replaces all of > this with calls to a generic API callback, so trying to improve this was > pretty silly from me. > > > Please, fix the couple remaining details in v20, add the docs for the > callbacks, and I'll try to polish it and get it committed. Thank you very much. Please find an attempt to comply with the requested changes in the attached. > > > regards > > -- > Tomas Vondra > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company
Вложения
On Mon, Jan 23, 2023 at 05:31:55PM +0000, gkokolatos@pm.me wrote: > Please find attached v23 which reintroduces the split. > > 0001 is reworked to have a reduced footprint than before. Also in an attempt > to facilitate the readability, 0002 splits the API's and the uncompressed > implementation in separate files. Thanks for updating the patch. Could you address the review comments I sent here ? https://www.postgresql.org/message-id/20230108194524.GA27637%40telsasoft.com Thanks, -- Justin
------- Original Message ------- On Monday, January 23rd, 2023 at 7:00 PM, Justin Pryzby <pryzby@telsasoft.com> wrote: > > > On Mon, Jan 23, 2023 at 05:31:55PM +0000, gkokolatos@pm.me wrote: > > > Please find attached v23 which reintroduces the split. > > > > 0001 is reworked to have a reduced footprint than before. Also in an attempt > > to facilitate the readability, 0002 splits the API's and the uncompressed > > implementation in separate files. > > > Thanks for updating the patch. Could you address the review comments I > sent here ? > https://www.postgresql.org/message-id/20230108194524.GA27637%40telsasoft.com Please find v24 attached. Cheers, //Georgios > > Thanks, > -- > Justin
Вложения
On Tue, Jan 24, 2023 at 03:56:20PM +0000, gkokolatos@pm.me wrote: > On Monday, January 23rd, 2023 at 7:00 PM, Justin Pryzby <pryzby@telsasoft.com> wrote: > > On Mon, Jan 23, 2023 at 05:31:55PM +0000, gkokolatos@pm.me wrote: > > > > > Please find attached v23 which reintroduces the split. > > > > > > 0001 is reworked to have a reduced footprint than before. Also in an attempt > > > to facilitate the readability, 0002 splits the API's and the uncompressed > > > implementation in separate files. > > > > Thanks for updating the patch. Could you address the review comments I > > sent here ? > > https://www.postgresql.org/message-id/20230108194524.GA27637%40telsasoft.com > > Please find v24 attached. Thanks for updating the patch. In 001, RestoreArchive() does: > -#ifndef HAVE_LIBZ > - if (AH->compression_spec.algorithm == PG_COMPRESSION_GZIP && > - AH->PrintTocDataPtr != NULL) > + supports_compression = false; > + if (AH->compression_spec.algorithm == PG_COMPRESSION_NONE || > + AH->compression_spec.algorithm == PG_COMPRESSION_GZIP) > + supports_compression = true; > + > + if (AH->PrintTocDataPtr != NULL) > { > for (te = AH->toc->next; te != AH->toc; te = te->next) > { > if (te->hadDumper && (te->reqs & REQ_DATA) != 0) > - pg_fatal("cannot restore from compressed archive (compression not supported in this installation)"); > + { > +#ifndef HAVE_LIBZ > + if (AH->compression_spec.algorithm == PG_COMPRESSION_GZIP) > + supports_compression = false; > +#endif > + if (supports_compression == false) > + pg_fatal("cannot restore from compressed archive (compression not supported inthis installation)"); > + } > } > } > -#endif This first checks if the algorithm is implemented, and then checks if the algorithm is supported by the current build - that confused me for a bit. It seems unnecessary to check for unimplemented algorithms before looping. That also requires referencing both GZIP and LZ4 in two places. I think it could be written to avoid the need to change for added compression algorithms: + if (te->hadDumper && (te->reqs & REQ_DATA) != 0) + { + /* Check if the compression algorithm is supported */ + pg_compress_specification spec; + parse_compress_specification(AH->compression_spec.algorithm, NULL, &spec); + if (spec->parse_error != NULL) + pg_fatal(spec->parse_error); + } Or maybe add a new function to compression.c to indicate whether a given algorithm is supported. That would also indicate *which* compression library isn't supported. Other than that, I think 001 is ready. 002/003 use these names, which I think are too similar - initially I didn't even realize there were two separate functions (each with a second stub function to handle the case of unsupported compression): +extern void InitCompressorGzip(CompressorState *cs, const pg_compress_specification compression_spec); +extern void InitCompressGzip(CompressFileHandle *CFH, const pg_compress_specification compression_spec); +extern void InitCompressorLZ4(CompressorState *cs, const pg_compress_specification compression_spec); +extern void InitCompressLZ4(CompressFileHandle *CFH, const pg_compress_specification compression_spec); typo: s/not build with/not built with/ Should AllocateCompressor() set cs->compression_spec, rather than doing it in each compressor ? Thanks for considering. -- Justin
------- Original Message ------- On Wednesday, January 25th, 2023 at 2:42 AM, Justin Pryzby <pryzby@telsasoft.com> wrote: > > > On Tue, Jan 24, 2023 at 03:56:20PM +0000, gkokolatos@pm.me wrote: > > > On Monday, January 23rd, 2023 at 7:00 PM, Justin Pryzby pryzby@telsasoft.com wrote: > > > > > On Mon, Jan 23, 2023 at 05:31:55PM +0000, gkokolatos@pm.me wrote: > > > > > > > Please find attached v23 which reintroduces the split. > > > > > > > > 0001 is reworked to have a reduced footprint than before. Also in an attempt > > > > to facilitate the readability, 0002 splits the API's and the uncompressed > > > > implementation in separate files. > > > > > > Thanks for updating the patch. Could you address the review comments I > > > sent here ? > > > https://www.postgresql.org/message-id/20230108194524.GA27637%40telsasoft.com > > > > Please find v24 attached. > > > Thanks for updating the patch. > > In 001, RestoreArchive() does: > > > -#ifndef HAVE_LIBZ > > - if (AH->compression_spec.algorithm == PG_COMPRESSION_GZIP && > > - AH->PrintTocDataPtr != NULL) > > + supports_compression = false; > > + if (AH->compression_spec.algorithm == PG_COMPRESSION_NONE || > > + AH->compression_spec.algorithm == PG_COMPRESSION_GZIP) > > + supports_compression = true; > > + > > + if (AH->PrintTocDataPtr != NULL) > > { > > for (te = AH->toc->next; te != AH->toc; te = te->next) > > { > > if (te->hadDumper && (te->reqs & REQ_DATA) != 0) > > - pg_fatal("cannot restore from compressed archive (compression not supported in this installation)"); > > + { > > +#ifndef HAVE_LIBZ > > + if (AH->compression_spec.algorithm == PG_COMPRESSION_GZIP) > > + supports_compression = false; > > +#endif > > + if (supports_compression == false) > > + pg_fatal("cannot restore from compressed archive (compression not supported in this installation)"); > > + } > > } > > } > > -#endif > > > This first checks if the algorithm is implemented, and then checks if > the algorithm is supported by the current build - that confused me for a > bit. It seems unnecessary to check for unimplemented algorithms before > looping. That also requires referencing both GZIP and LZ4 in two > places. I am not certain that it is unnecessary, at least not in the way that is described. The idea is that new compression methods can be added, without changing the archive's version number. It is very possible that it is requested to restore an archive compressed with a method not implemented in the current binary. The first check takes care of that and sets supports_compression only for the supported versions. It is possible to enter the loop with supports_compression already set to false, for example because the archive was compressed with ZSTD, triggering the fatal error. Of course, one can throw the error before entering the loop, yet I think that it does not help the readability of the code. IMHO it is easier to follow if the error is thrown once during that check. > > I think it could be written to avoid the need to change for added > compression algorithms: > > + if (te->hadDumper && (te->reqs & REQ_DATA) != 0) > > + { > + /* Check if the compression algorithm is supported */ > + pg_compress_specification spec; > + parse_compress_specification(AH->compression_spec.algorithm, NULL, &spec); > > + if (spec->parse_error != NULL) > > + pg_fatal(spec->parse_error); > > + } I am not certain how that would work in the example with ZSTD above. If I am not wrong, parse_compress_specification() will not throw an error if the codebase supports ZSTD, yet this specific pg_dump binary will not support it because ZSTD is not implemented. parse_compress_specification() is not aware of that and should not be aware of it, should it? > > Or maybe add a new function to compression.c to indicate whether a given > algorithm is supported. I am not certain how this would help, as compression.c is supposed to be used by multiple binaries while this is a pg_dump specific detail. > That would also indicate which compression library isn't supported. If anything, I can suggest to throw an error much earlier, i.e. in ReadHead(), and remove altogether this check. On the other hand, I like the belts and suspenders approach because there are no more checks after this point. > Other than that, I think 001 is ready. Thank you. > 002/003 use these names, which I think are too similar - initially I > didn't even realize there were two separate functions (each with a > second stub function to handle the case of unsupported compression): > > +extern void InitCompressorGzip(CompressorState *cs, const pg_compress_specification compression_spec); > +extern void InitCompressGzip(CompressFileHandle *CFH, const pg_compress_specification compression_spec); > > +extern void InitCompressorLZ4(CompressorState *cs, const pg_compress_specification compression_spec); > +extern void InitCompressLZ4(CompressFileHandle *CFH, const pg_compress_specification compression_spec); Fair enough. Names are now updated. > > typo: > s/not build with/not built with/ Thank you. > > Should AllocateCompressor() set cs->compression_spec, rather than doing > it in each compressor ? I think that compression_spec should be owned by each compressor. With that in mind, it makes more sense to set it within each compressor. This is not a hill I am willing to die on though. Please find v25 attached. > > Thanks for considering. > > -- > Justin
Вложения
On 1/25/23 16:37, gkokolatos@pm.me wrote: > > > > > > ------- Original Message ------- > On Wednesday, January 25th, 2023 at 2:42 AM, Justin Pryzby <pryzby@telsasoft.com> wrote: > > >> >> >> On Tue, Jan 24, 2023 at 03:56:20PM +0000, gkokolatos@pm.me wrote: >> >>> On Monday, January 23rd, 2023 at 7:00 PM, Justin Pryzby pryzby@telsasoft.com wrote: >>> >>>> On Mon, Jan 23, 2023 at 05:31:55PM +0000, gkokolatos@pm.me wrote: >>>> >>>>> Please find attached v23 which reintroduces the split. >>>>> >>>>> 0001 is reworked to have a reduced footprint than before. Also in an attempt >>>>> to facilitate the readability, 0002 splits the API's and the uncompressed >>>>> implementation in separate files. >>>> >>>> Thanks for updating the patch. Could you address the review comments I >>>> sent here ? >>>> https://www.postgresql.org/message-id/20230108194524.GA27637%40telsasoft.com >>> >>> Please find v24 attached. >> >> >> Thanks for updating the patch. >> >> In 001, RestoreArchive() does: >> >>> -#ifndef HAVE_LIBZ >>> - if (AH->compression_spec.algorithm == PG_COMPRESSION_GZIP && >>> - AH->PrintTocDataPtr != NULL) >>> + supports_compression = false; >>> + if (AH->compression_spec.algorithm == PG_COMPRESSION_NONE || >>> + AH->compression_spec.algorithm == PG_COMPRESSION_GZIP) >>> + supports_compression = true; >>> + >>> + if (AH->PrintTocDataPtr != NULL) >>> { >>> for (te = AH->toc->next; te != AH->toc; te = te->next) >>> { >>> if (te->hadDumper && (te->reqs & REQ_DATA) != 0) >>> - pg_fatal("cannot restore from compressed archive (compression not supported in this installation)"); >>> + { >>> +#ifndef HAVE_LIBZ >>> + if (AH->compression_spec.algorithm == PG_COMPRESSION_GZIP) >>> + supports_compression = false; >>> +#endif >>> + if (supports_compression == false) >>> + pg_fatal("cannot restore from compressed archive (compression not supported in this installation)"); >>> + } >>> } >>> } >>> -#endif >> >> >> This first checks if the algorithm is implemented, and then checks if >> the algorithm is supported by the current build - that confused me for a >> bit. It seems unnecessary to check for unimplemented algorithms before >> looping. That also requires referencing both GZIP and LZ4 in two >> places. > > I am not certain that it is unnecessary, at least not in the way that is > described. The idea is that new compression methods can be added, without > changing the archive's version number. It is very possible that it is > requested to restore an archive compressed with a method not implemented > in the current binary. The first check takes care of that and sets > supports_compression only for the supported versions. It is possible to > enter the loop with supports_compression already set to false, for example > because the archive was compressed with ZSTD, triggering the fatal error. > > Of course, one can throw the error before entering the loop, yet I think > that it does not help the readability of the code. IMHO it is easier to > follow if the error is thrown once during that check. > Actually, I don't understand why 0001 moves the check into the loop. I mean, why not check HAVE_LIBZ before the loop? >> >> I think it could be written to avoid the need to change for added >> compression algorithms: >> >> + if (te->hadDumper && (te->reqs & REQ_DATA) != 0) >> >> + { >> + /* Check if the compression algorithm is supported */ >> + pg_compress_specification spec; >> + parse_compress_specification(AH->compression_spec.algorithm, NULL, &spec); >> >> + if (spec->parse_error != NULL) >> >> + pg_fatal(spec->parse_error); >> >> + } > > I am not certain how that would work in the example with ZSTD above. > If I am not wrong, parse_compress_specification() will not throw an error > if the codebase supports ZSTD, yet this specific pg_dump binary will not > support it because ZSTD is not implemented. parse_compress_specification() > is not aware of that and should not be aware of it, should it? > Not sure. What happens in a similar situation now? That is, when trying to deal with an archive gzip-compressed in a build without libz? regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Jan 25, 2023 at 03:37:12PM +0000, gkokolatos@pm.me wrote: > Of course, one can throw the error before entering the loop, yet I think > that it does not help the readability of the code. IMHO it is easier to > follow if the error is thrown once during that check. > If anything, I can suggest to throw an error much earlier, i.e. in ReadHead(), > and remove altogether this check. On the other hand, I like the belts > and suspenders approach because there are no more checks after this point. While looking at this, I realized that commit 5e73a6048 introduced a regression: @@ -3740,19 +3762,24 @@ ReadHead(ArchiveHandle *AH) - if (AH->compression != 0) - pg_log_warning("archive is compressed, but this installation does not support compression -- no data willbe available"); + if (AH->compression_spec.algorithm == PG_COMPRESSION_GZIP) + pg_fatal("archive is compressed, but this installation does not support compression"); Before, it was possible to restore non-data chunks of a dump file, even if the current build didn't support its compression. But that's now impossible - and it makes the code we're discussing in RestoreArchive() unreachable. I don't think we can currently test for that, since it requires creating a dump using a build --with compression and then trying to restore using a build --without compression. The coverage report disagrees with me, though... https://coverage.postgresql.org/src/bin/pg_dump/pg_backup_archiver.c.gcov.html#3901 > > I think it could be written to avoid the need to change for added > > compression algorithms: ... > > I am not certain how that would work in the example with ZSTD above. > If I am not wrong, parse_compress_specification() will not throw an error > if the codebase supports ZSTD, yet this specific pg_dump binary will not > support it because ZSTD is not implemented. parse_compress_specification() > is not aware of that and should not be aware of it, should it? You're right. I think the 001 patch should try to remove hardcoded references to LIBZ/GZIP, such that the later patches don't need to update those same places for LZ4. For example in ReadHead() and RestoreArchive(), and maybe other places dealing with file extensions. Maybe that could be done by adding a function specific to pg_dump indicating whether or not an algorithm is implemented and supported. -- Justin
------- Original Message ------- On Wednesday, January 25th, 2023 at 6:28 PM, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > > > On 1/25/23 16:37, gkokolatos@pm.me wrote: > > > ------- Original Message ------- > > On Wednesday, January 25th, 2023 at 2:42 AM, Justin Pryzby pryzby@telsasoft.com wrote: > > > > > On Tue, Jan 24, 2023 at 03:56:20PM +0000, gkokolatos@pm.me wrote: > > > > > > > On Monday, January 23rd, 2023 at 7:00 PM, Justin Pryzby pryzby@telsasoft.com wrote: > > > > > > > > > On Mon, Jan 23, 2023 at 05:31:55PM +0000, gkokolatos@pm.me wrote: > > > > > > > > > > > Please find attached v23 which reintroduces the split. > > > > > > > > > > > > 0001 is reworked to have a reduced footprint than before. Also in an attempt > > > > > > to facilitate the readability, 0002 splits the API's and the uncompressed > > > > > > implementation in separate files. > > > > > > > > > > Thanks for updating the patch. Could you address the review comments I > > > > > sent here ? > > > > > https://www.postgresql.org/message-id/20230108194524.GA27637%40telsasoft.com > > > > > > > > Please find v24 attached. > > > > > > Thanks for updating the patch. > > > > > > In 001, RestoreArchive() does: > > > > > > > -#ifndef HAVE_LIBZ > > > > - if (AH->compression_spec.algorithm == PG_COMPRESSION_GZIP && > > > > - AH->PrintTocDataPtr != NULL) > > > > + supports_compression = false; > > > > + if (AH->compression_spec.algorithm == PG_COMPRESSION_NONE || > > > > + AH->compression_spec.algorithm == PG_COMPRESSION_GZIP) > > > > + supports_compression = true; > > > > + > > > > + if (AH->PrintTocDataPtr != NULL) > > > > { > > > > for (te = AH->toc->next; te != AH->toc; te = te->next) > > > > { > > > > if (te->hadDumper && (te->reqs & REQ_DATA) != 0) > > > > - pg_fatal("cannot restore from compressed archive (compression not supported in this installation)"); > > > > + { > > > > +#ifndef HAVE_LIBZ > > > > + if (AH->compression_spec.algorithm == PG_COMPRESSION_GZIP) > > > > + supports_compression = false; > > > > +#endif > > > > + if (supports_compression == false) > > > > + pg_fatal("cannot restore from compressed archive (compression not supported in this installation)"); > > > > + } > > > > } > > > > } > > > > -#endif > > > > > > This first checks if the algorithm is implemented, and then checks if > > > the algorithm is supported by the current build - that confused me for a > > > bit. It seems unnecessary to check for unimplemented algorithms before > > > looping. That also requires referencing both GZIP and LZ4 in two > > > places. > > > > I am not certain that it is unnecessary, at least not in the way that is > > described. The idea is that new compression methods can be added, without > > changing the archive's version number. It is very possible that it is > > requested to restore an archive compressed with a method not implemented > > in the current binary. The first check takes care of that and sets > > supports_compression only for the supported versions. It is possible to > > enter the loop with supports_compression already set to false, for example > > because the archive was compressed with ZSTD, triggering the fatal error. > > > > Of course, one can throw the error before entering the loop, yet I think > > that it does not help the readability of the code. IMHO it is easier to > > follow if the error is thrown once during that check. > > > Actually, I don't understand why 0001 moves the check into the loop. I > mean, why not check HAVE_LIBZ before the loop? The intention is to be able to restore archives that don't contain data. In that case compression becomes irrelevant as only the data in an archive is compressed. > > > > I think it could be written to avoid the need to change for added > > > compression algorithms: > > > > > > + if (te->hadDumper && (te->reqs & REQ_DATA) != 0) > > > > > > + { > > > + /* Check if the compression algorithm is supported */ > > > + pg_compress_specification spec; > > > + parse_compress_specification(AH->compression_spec.algorithm, NULL, &spec); > > > > > > + if (spec->parse_error != NULL) > > > > > > + pg_fatal(spec->parse_error); > > > > > > + } > > > > I am not certain how that would work in the example with ZSTD above. > > If I am not wrong, parse_compress_specification() will not throw an error > > if the codebase supports ZSTD, yet this specific pg_dump binary will not > > support it because ZSTD is not implemented. parse_compress_specification() > > is not aware of that and should not be aware of it, should it? > > > Not sure. What happens in a similar situation now? That is, when trying > to deal with an archive gzip-compressed in a build without libz? In case that there are no data chunks, the archive will be restored. Cheers, //Georgios > > regards > > -- > Tomas Vondra > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company
------- Original Message ------- On Wednesday, January 25th, 2023 at 7:00 PM, Justin Pryzby <pryzby@telsasoft.com> wrote: > > > On Wed, Jan 25, 2023 at 03:37:12PM +0000, gkokolatos@pm.me wrote: > > While looking at this, I realized that commit 5e73a6048 introduced a > regression: > > @@ -3740,19 +3762,24 @@ ReadHead(ArchiveHandle *AH) > > - if (AH->compression != 0) > > - pg_log_warning("archive is compressed, but this installation does not support compression -- no data will be available"); > + if (AH->compression_spec.algorithm == PG_COMPRESSION_GZIP) > > + pg_fatal("archive is compressed, but this installation does not support compression"); > > Before, it was possible to restore non-data chunks of a dump file, even > if the current build didn't support its compression. But that's now > impossible - and it makes the code we're discussing in RestoreArchive() > unreachable. Nice catch! Cheers, //Georgios > -- > Justin
On Wed, Jan 25, 2023 at 07:57:18PM +0000, gkokolatos@pm.me wrote: > Nice catch! Let me see.. -- Michael
Вложения
On Wed, Jan 25, 2023 at 12:00:20PM -0600, Justin Pryzby wrote: > While looking at this, I realized that commit 5e73a6048 introduced a > regression: > > @@ -3740,19 +3762,24 @@ ReadHead(ArchiveHandle *AH) > > - if (AH->compression != 0) > - pg_log_warning("archive is compressed, but this installation does not support compression -- no data willbe available"); > + if (AH->compression_spec.algorithm == PG_COMPRESSION_GZIP) > + pg_fatal("archive is compressed, but this installation does not support compression"); > > Before, it was possible to restore non-data chunks of a dump file, even > if the current build didn't support its compression. But that's now > impossible - and it makes the code we're discussing in RestoreArchive() > unreachable. Right. The impacts the possibility of looking at the header data, which is useful with pg_restore -l for example. On a dump that's been compressed, pg_restore <= 15 would always print the TOC entries with or without compression support. On HEAD, this code prevents the header lookup. All *nix or BSD platforms should have support for zlib, I hope.. Still that could be an issue on Windows, and this would prevent folks to check the contents of the dumps after saving it on a WIN32 host, so let's undo that. So, I have been testing the attached with four sets of binaries from 15/HEAD and with[out] zlib support, and this brings HEAD back to the pre-15 state (header information able to show up, still failure when attempting to restore the dump's data without zlib). > I don't think we can currently test for that, since it requires creating a dump > using a build --with compression and then trying to restore using a build > --without compression. Right, the location of the data is in the header, and I don't see how you would be able to do that without two sets of binaries at hand, but our tests run under the assumption that you have only one. Well, that's not entirely true as well, as you could create a TAP test like pg_upgrade that relies on a environment variable pointing to a second set of binaries. That's not worth the complication involved, IMO. > The coverage report disagrees with me, though... > https://coverage.postgresql.org/src/bin/pg_dump/pg_backup_archiver.c.gcov.html#3901 Isn't that one of the tests like compression_gzip_plain? Thoughts? -- Michael
Вложения
On Thu, Jan 26, 2023 at 02:49:27PM +0900, Michael Paquier wrote: > On Wed, Jan 25, 2023 at 12:00:20PM -0600, Justin Pryzby wrote: > > While looking at this, I realized that commit 5e73a6048 introduced a > > regression: > > > > @@ -3740,19 +3762,24 @@ ReadHead(ArchiveHandle *AH) > > > > - if (AH->compression != 0) > > - pg_log_warning("archive is compressed, but this installation does not support compression -- no datawill be available"); > > + if (AH->compression_spec.algorithm == PG_COMPRESSION_GZIP) > > + pg_fatal("archive is compressed, but this installation does not support compression"); > > > > Before, it was possible to restore non-data chunks of a dump file, even > > if the current build didn't support its compression. But that's now > > impossible - and it makes the code we're discussing in RestoreArchive() > > unreachable. > > Right. The impacts the possibility of looking at the header data, > which is useful with pg_restore -l for example. It's not just header data - it's schema and (I think) everything other than table data. > > The coverage report disagrees with me, though... > > https://coverage.postgresql.org/src/bin/pg_dump/pg_backup_archiver.c.gcov.html#3901 > > Isn't that one of the tests like compression_gzip_plain? I'm not sure what you mean. Plain dump is restored with psql and not with pg_restore. My line number was wrong: https://coverage.postgresql.org/src/bin/pg_dump/pg_backup_archiver.c.gcov.html#390 What test would hit that code without rebuilding ? 394 : #ifndef HAVE_LIBZ 395 : if (AH->compression_spec.algorithm == PG_COMPRESSION_GZIP && > Thoughts? > #ifndef HAVE_LIBZ > if (AH->compression_spec.algorithm == PG_COMPRESSION_GZIP) > - pg_fatal("archive is compressed, but this installation does not support compression"); > + pg_log_warning("archive is compressed, but this installation does not support compression -- no data will be available"); Your patch is fine for now, but these errors should eventually specify *which* compression algorithm is unavailable. I think that should be a part of the 001 patch, ideally in a way that minimizes the number of places which need to be updated when adding an algorithm. -- Justin
------- Original Message ------- On Thursday, January 26th, 2023 at 7:28 AM, Justin Pryzby <pryzby@telsasoft.com> wrote: > > > On Thu, Jan 26, 2023 at 02:49:27PM +0900, Michael Paquier wrote: > > > On Wed, Jan 25, 2023 at 12:00:20PM -0600, Justin Pryzby wrote: > > > > > While looking at this, I realized that commit 5e73a6048 introduced a > > > regression: > > > > > > @@ -3740,19 +3762,24 @@ ReadHead(ArchiveHandle *AH) > > > > > > - if (AH->compression != 0) > > > - pg_log_warning("archive is compressed, but this installation does not support compression -- no data will be available"); > > > + if (AH->compression_spec.algorithm == PG_COMPRESSION_GZIP) > > > + pg_fatal("archive is compressed, but this installation does not support compression"); > > > > > > Before, it was possible to restore non-data chunks of a dump file, even > > > if the current build didn't support its compression. But that's now > > > impossible - and it makes the code we're discussing in RestoreArchive() > > > unreachable. > > > > Right. The impacts the possibility of looking at the header data, > > which is useful with pg_restore -l for example. > > > It's not just header data - it's schema and (I think) everything other > than table data. > > > > The coverage report disagrees with me, though... > > > https://coverage.postgresql.org/src/bin/pg_dump/pg_backup_archiver.c.gcov.html#3901 > > > > Isn't that one of the tests like compression_gzip_plain? > > > I'm not sure what you mean. Plain dump is restored with psql and not > with pg_restore. > > My line number was wrong: > https://coverage.postgresql.org/src/bin/pg_dump/pg_backup_archiver.c.gcov.html#390 > > What test would hit that code without rebuilding ? > > 394 : #ifndef HAVE_LIBZ > 395 : if (AH->compression_spec.algorithm == PG_COMPRESSION_GZIP && > > > Thoughts? > > #ifndef HAVE_LIBZ > > if (AH->compression_spec.algorithm == PG_COMPRESSION_GZIP) > > - pg_fatal("archive is compressed, but this installation does not support compression"); > > + pg_log_warning("archive is compressed, but this installation does not support compression -- no data will be available"); > > > Your patch is fine for now, but these errors should eventually specify > which compression algorithm is unavailable. I think that should be a > part of the 001 patch, ideally in a way that minimizes the number of > places which need to be updated when adding an algorithm. I gave this a little bit of thought. I think that ReadHead should not emit a warning, or at least not this warning as it is slightly misleading. It implies that it will automatically turn off data restoration, which is false. Further ahead, the code will fail with a conflicting error message stating that the compression is not available. Instead, it would be cleaner both for the user and the maintainer to move the check in RestoreArchive and make it the sole responsible for this logic. Please find v26 attached. 0001 does the above and 0002 addresses Justin's complaints regarding the code footprint. //Cheers, Georgios > > -- > Justin
Вложения
On Thu, Jan 26, 2023 at 11:24:47AM +0000, gkokolatos@pm.me wrote: > I gave this a little bit of thought. I think that ReadHead should not > emit a warning, or at least not this warning as it is slightly misleading. > It implies that it will automatically turn off data restoration, which is > false. Further ahead, the code will fail with a conflicting error message > stating that the compression is not available. > > Instead, it would be cleaner both for the user and the maintainer to > move the check in RestoreArchive and make it the sole responsible for > this logic. - pg_fatal("cannot restore from compressed archive (compression not supported in this installation)"); + pg_fatal("cannot restore data from compressed archive (compression not supported in this installation)"); Hmm. I don't mind changing this part as you suggest. -#ifndef HAVE_LIBZ - if (AH->compression_spec.algorithm == PG_COMPRESSION_GZIP) - pg_fatal("archive is compressed, but this installation does not support compression"); -#endif However I think that we'd better keep the warning, as it can offer a hint when using pg_restore -l not built with compression support if looking at a dump that has been compressed. -- Michael
Вложения
------- Original Message ------- On Thursday, January 26th, 2023 at 12:53 PM, Michael Paquier <michael@paquier.xyz> wrote: > > > On Thu, Jan 26, 2023 at 11:24:47AM +0000, gkokolatos@pm.me wrote: > > > I gave this a little bit of thought. I think that ReadHead should not > > emit a warning, or at least not this warning as it is slightly misleading. > > It implies that it will automatically turn off data restoration, which is > > false. Further ahead, the code will fail with a conflicting error message > > stating that the compression is not available. > > > > Instead, it would be cleaner both for the user and the maintainer to > > move the check in RestoreArchive and make it the sole responsible for > > this logic. > > > - pg_fatal("cannot restore from compressed archive (compression not supported in this installation)"); > + pg_fatal("cannot restore data from compressed archive (compression not supported in this installation)"); > Hmm. I don't mind changing this part as you suggest. > > -#ifndef HAVE_LIBZ > - if (AH->compression_spec.algorithm == PG_COMPRESSION_GZIP) > > - pg_fatal("archive is compressed, but this installation does not support compression"); > -#endif > However I think that we'd better keep the warning, as it can offer a > hint when using pg_restore -l not built with compression support if > looking at a dump that has been compressed. Fair enough. Please find v27 attached. Cheers, //Georgios > -- > Michael
Вложения
On Wed, Jan 25, 2023 at 07:57:18PM +0000, gkokolatos@pm.me wrote: > On Wednesday, January 25th, 2023 at 7:00 PM, Justin Pryzby <pryzby@telsasoft.com> wrote: > > While looking at this, I realized that commit 5e73a6048 introduced a > > regression: > > > > @@ -3740,19 +3762,24 @@ ReadHead(ArchiveHandle *AH) > > > > - if (AH->compression != 0) > > > > - pg_log_warning("archive is compressed, but this installation does not support compression -- no data will be available"); > > + if (AH->compression_spec.algorithm == PG_COMPRESSION_GZIP) > > > > + pg_fatal("archive is compressed, but this installation does not support compression"); > > > > Before, it was possible to restore non-data chunks of a dump file, even > > if the current build didn't support its compression. But that's now > > impossible - and it makes the code we're discussing in RestoreArchive() > > unreachable. On Thu, Jan 26, 2023 at 08:53:28PM +0900, Michael Paquier wrote: > On Thu, Jan 26, 2023 at 11:24:47AM +0000, gkokolatos@pm.me wrote: > > I gave this a little bit of thought. I think that ReadHead should not > > emit a warning, or at least not this warning as it is slightly misleading. > > It implies that it will automatically turn off data restoration, which is > > false. Further ahead, the code will fail with a conflicting error message > > stating that the compression is not available. > > > > Instead, it would be cleaner both for the user and the maintainer to > > move the check in RestoreArchive and make it the sole responsible for > > this logic. > > - pg_fatal("cannot restore from compressed archive (compression not supported in this installation)"); > + pg_fatal("cannot restore data from compressed archive (compression not supported in this installation)"); > Hmm. I don't mind changing this part as you suggest. > > -#ifndef HAVE_LIBZ > - if (AH->compression_spec.algorithm == PG_COMPRESSION_GZIP) > - pg_fatal("archive is compressed, but this installation does not support compression"); > -#endif > However I think that we'd better keep the warning, as it can offer a > hint when using pg_restore -l not built with compression support if > looking at a dump that has been compressed. Yeah. But the original log_warning text was better, and should be restored: - if (AH->compression != 0) - pg_log_warning("archive is compressed, but this installation does not support compression -- no data willbe available"); That commit also added this to pg-dump.c: + case PG_COMPRESSION_ZSTD: + pg_fatal("compression with %s is not yet supported", "ZSTD"); + break; + case PG_COMPRESSION_LZ4: + pg_fatal("compression with %s is not yet supported", "LZ4"); + break; In 002, that could be simplified by re-using the supports_compression() function. (And maybe the same in WriteDataToArchive()?) -- Justin
On Thu, Jan 26, 2023 at 12:22:45PM -0600, Justin Pryzby wrote: > Yeah. But the original log_warning text was better, and should be > restored: > > - if (AH->compression != 0) > - pg_log_warning("archive is compressed, but this installation does not support compression -- no data willbe available"); Yeah, this one's on me. So I have gone with the simplest solution and applied a fix to restore the original behavior, with the same warning showing up. -- Michael
Вложения
On Mon, Jan 16, 2023 at 11:54:46AM +0900, Michael Paquier wrote: > On Sun, Jan 15, 2023 at 07:56:25PM -0600, Justin Pryzby wrote: > > On Mon, Jan 16, 2023 at 10:28:50AM +0900, Michael Paquier wrote: > >> The functions changed by 0001 are cfopen[_write](), > >> AllocateCompressor() and ReadDataFromArchive(). Why is it a good idea > >> to change these interfaces which basically exist to handle inputs? > > > > I changed to pass pg_compress_specification as a pointer, since that's > > the usual convention for structs, as followed by the existing uses of > > pg_compress_specification. > > Okay, but what do we gain here? It seems to me that this introduces > the risk that a careless change in one of the internal routines if > they change slight;ly compress_spec, hence impacting any of their > callers? Or is that fixing an actual bug (except if I am missing your > point, that does not seem to be the case)? To circle back to this: I was not saying there's any bug. The proposed change was only to follow normal and existing normal conventions for passing structs. It could also be a pointer to const. It's fine with me if you say that it's intentional how it's written already. > >> Is there some benefit in changing compression_spec within the > >> internals of these routines before going back one layer down to their > >> callers? Changing the compression_spec on-the-fly in these internal > >> paths could be risky, actually, no? > > > > I think what you're saying is that if the spec is passed as a pointer, > > then the called functions shouldn't set spec->algorithm=something. > > Yes. HEAD makes sure of that, 0001 would not prevent that. So I am a > bit confused in seeing how this is a benefit.
On Thu, Jan 26, 2023 at 12:22:45PM -0600, Justin Pryzby wrote: > That commit also added this to pg-dump.c: > > + case PG_COMPRESSION_ZSTD: > + pg_fatal("compression with %s is not yet supported", "ZSTD"); > + break; > + case PG_COMPRESSION_LZ4: > + pg_fatal("compression with %s is not yet supported", "LZ4"); > + break; > > In 002, that could be simplified by re-using the supports_compression() > function. (And maybe the same in WriteDataToArchive()?) The first patch aims to minimize references to ".gz" and "GZIP" and ZLIB. pg_backup_directory.c comments still refers to ".gz". I think the patch should ideally change to refer to "the compressed file extension" (similar to compress_io.c), avoiding the need to update it later. I think the file extension stuff could be generalized, so it doesn't need to be updated in multiple places (pg_backup_directory.c and compress_io.c). Maybe it's useful to add a function to return the extension of a given compression method. It could go in compression.c, and be useful in basebackup. For the 2nd patch: I might be in the minority, but I still think some references to "gzip" should say "zlib": +} GzipCompressorState; + +/* Private routines that support gzip compressed data I/O */ +static void +DeflateCompressorGzip(ArchiveHandle *AH, CompressorState *cs, bool flush) In my mind, three things here are misleading, because it doesn't use gzip headers: | GzipCompressorState, DeflateCompressorGzip, "gzip compressed". This comment is about exactly that: * underlying stream. The second API is a wrapper around fopen/gzopen and * friends, providing an interface similar to those, but abstracts away * the possible compression. Both APIs use libz for the compression, but * the second API uses gzip headers, so the resulting files can be easily * manipulated with the gzip utility. AIUI, Michael says that it's fine that the user-facing command-line options use "-Z gzip" (even though the "custom" format doesn't use gzip headers). I'm okay with that, as long as that's discussed/understood. -- Justin
------- Original Message ------- On Friday, January 27th, 2023 at 6:23 PM, Justin Pryzby <pryzby@telsasoft.com> wrote: > > > On Thu, Jan 26, 2023 at 12:22:45PM -0600, Justin Pryzby wrote: > > > That commit also added this to pg-dump.c: > > > > + case PG_COMPRESSION_ZSTD: > > + pg_fatal("compression with %s is not yet supported", "ZSTD"); > > + break; > > + case PG_COMPRESSION_LZ4: > > + pg_fatal("compression with %s is not yet supported", "LZ4"); > > + break; > > > > In 002, that could be simplified by re-using the supports_compression() > > function. (And maybe the same in WriteDataToArchive()?) > > > The first patch aims to minimize references to ".gz" and "GZIP" and > ZLIB. pg_backup_directory.c comments still refers to ".gz". I think > the patch should ideally change to refer to "the compressed file > extension" (similar to compress_io.c), avoiding the need to update it > later. > > I think the file extension stuff could be generalized, so it doesn't > need to be updated in multiple places (pg_backup_directory.c and > compress_io.c). Maybe it's useful to add a function to return the > extension of a given compression method. It could go in compression.c, > and be useful in basebackup. > > For the 2nd patch: > > I might be in the minority, but I still think some references to "gzip" > should say "zlib": > > +} GzipCompressorState; > + > +/* Private routines that support gzip compressed data I/O */ > +static void > +DeflateCompressorGzip(ArchiveHandle *AH, CompressorState *cs, bool flush) > > In my mind, three things here are misleading, because it doesn't use > gzip headers: > > | GzipCompressorState, DeflateCompressorGzip, "gzip compressed". > > This comment is about exactly that: > > * underlying stream. The second API is a wrapper around fopen/gzopen and > * friends, providing an interface similar to those, but abstracts away > * the possible compression. Both APIs use libz for the compression, but > * the second API uses gzip headers, so the resulting files can be easily > * manipulated with the gzip utility. > > AIUI, Michael says that it's fine that the user-facing command-line > options use "-Z gzip" (even though the "custom" format doesn't use gzip > headers). I'm okay with that, as long as that's discussed/understood. > Thank you for the input Justin. I am currently waiting for input from a third person to get some conclusion. I thought that it should be stated before my inactiveness is considered as indifference, which is not. Cheers, //Georgios > -- > Justin
On Tue, Jan 31, 2023 at 09:00:56AM +0000, gkokolatos@pm.me wrote: > > In my mind, three things here are misleading, because it doesn't use > > gzip headers: > > > > | GzipCompressorState, DeflateCompressorGzip, "gzip compressed". > > > > This comment is about exactly that: > > > > * underlying stream. The second API is a wrapper around fopen/gzopen and > > * friends, providing an interface similar to those, but abstracts away > > * the possible compression. Both APIs use libz for the compression, but > > * the second API uses gzip headers, so the resulting files can be easily > > * manipulated with the gzip utility. > > > > AIUI, Michael says that it's fine that the user-facing command-line > > options use "-Z gzip" (even though the "custom" format doesn't use gzip > > headers). I'm okay with that, as long as that's discussed/understood. > > Thank you for the input Justin. I am currently waiting for input from a > third person to get some conclusion. I thought that it should be stated > before my inactiveness is considered as indifference, which is not. I'm not sure what there is to lose by making the names more accurate - especially since they're private/internal-only. Tomas marked himself as a committer, so maybe could comment. It'd be nice to also come to some conclusion about whether -Fc -Z gzip is confusing (due to not actually using gzip). BTW, do you intend to merge this for v16 ? I verified in earlier patch versions that tests all pass with lz4 as the default compression method. And checked that gzip output is compatible with before, and that old dumps restore correctly, and there's no memory leaks or other errors. -- Justin
On Fri, Jan 27, 2023 2:04 AM gkokolatos@pm.me <gkokolatos@pm.me> wrote: > > ------- Original Message ------- > On Thursday, January 26th, 2023 at 12:53 PM, Michael Paquier > <michael@paquier.xyz> wrote: > > > > > > > > On Thu, Jan 26, 2023 at 11:24:47AM +0000, gkokolatos@pm.me wrote: > > > > > I gave this a little bit of thought. I think that ReadHead should not > > > emit a warning, or at least not this warning as it is slightly misleading. > > > It implies that it will automatically turn off data restoration, which is > > > false. Further ahead, the code will fail with a conflicting error message > > > stating that the compression is not available. > > > > > > Instead, it would be cleaner both for the user and the maintainer to > > > move the check in RestoreArchive and make it the sole responsible for > > > this logic. > > > > > > - pg_fatal("cannot restore from compressed archive (compression not > supported in this installation)"); > > + pg_fatal("cannot restore data from compressed archive (compression not > supported in this installation)"); > > Hmm. I don't mind changing this part as you suggest. > > > > -#ifndef HAVE_LIBZ > > - if (AH->compression_spec.algorithm == PG_COMPRESSION_GZIP) > > > > - pg_fatal("archive is compressed, but this installation does not support > compression"); > > -#endif > > However I think that we'd better keep the warning, as it can offer a > > hint when using pg_restore -l not built with compression support if > > looking at a dump that has been compressed. > > Fair enough. Please find v27 attached. > Hi, I am interested in this feature and tried the patch. While reading the comments, I noticed some minor things that could possibly be improved (in v27-0003 patch). 1. + /* + * Open a file for writing. + * + * 'mode' can be one of ''w', 'wb', 'a', and 'ab'. Requrires an already + * initialized CompressFileHandle. + */ + int (*open_write_func) (const char *path, const char *mode, + CompressFileHandle *CFH); There is a redundant single quote in front of 'w'. 2. /* * Callback function for WriteDataToArchive. Writes one block of (compressed) * data to the archive. */ /* * Callback function for ReadDataFromArchive. To keep things simple, we * always read one compressed block at a time. */ Should the function names in the comments be updated? WriteDataToArchive -> writeData ReadDataFromArchive -> readData 3. + Assert(strcmp(mode, "r") == 0 || strcmp(mode, "rb") == 0); Could we use PG_BINARY_R instead of "r" and "rb" here? Regards, Shi Yu
------- Original Message ------- On Wednesday, February 15th, 2023 at 2:51 PM, shiy.fnst@fujitsu.com <shiy.fnst@fujitsu.com> wrote: > > Hi, > > I am interested in this feature and tried the patch. While reading the comments, > I noticed some minor things that could possibly be improved (in v27-0003 patch). Thank you very much for the interest. Please find a rebased v28 attached. Due to the rebase, 0001 of v27 is no longer relevant and has been removed. Your comments are applied on v28-0002. > > 1. > + /* > + * Open a file for writing. > + * > + * 'mode' can be one of ''w', 'wb', 'a', and 'ab'. Requrires an already > + * initialized CompressFileHandle. > + */ > + int (*open_write_func) (const char *path, const char *mode, > + CompressFileHandle CFH); > > There is a redundant single quote in front of 'w'. Fixed. > > 2. > / > * Callback function for WriteDataToArchive. Writes one block of (compressed) > * data to the archive. > / > / > * Callback function for ReadDataFromArchive. To keep things simple, we > * always read one compressed block at a time. > */ > > Should the function names in the comments be updated? Agreed. Fixed. > > 3. > + Assert(strcmp(mode, "r") == 0 || strcmp(mode, "rb") == 0); > > Could we use PG_BINARY_R instead of "r" and "rb" here? We could and we should. Using PG_BINARY_R has the added benefit of needing only one strcmp() call. Fixed. Cheers, //Georgios > > Regards, > Shi Yu
Вложения
Hi Georgios, I spent some time looking at the patch again, and IMO it's RFC. But I need some help with the commit messages - I updated 0001 and 0002 but I wasn't quite sure what some of the stuff meant to say and/or it seemed maybe coming from an earlier patch version and obsolete. Could you go over them and check if I got it right? Also feel free to update the list of reviewers (I compiled that from substantial reviews on the thread). The 0003 commit message seems somewhat confusing - I added some XXX lines asking about unclear stuff. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
Some little updates since I last checked: + * This file also includes the implementation when compression is none for + * both API's. => this comment is obsolete. s/deffer/infer/ ? or determine ? This typo occurs multiple times. currently this includes only ".gz" => remove this phase from the 002 patch (or at least update it in 003). deferred by iteratively => inferred? s/Requrires/Requires/ twice. s/occured/occurred/ s/disc/disk/ ? Probably unimportant, but "disc" isn't used anywhere else. "compress file handle" => maybe these should say "compressed" supports_compression(): Since this is an exported function, it should probably be called pgdump_supports_compresion.
------- Original Message ------- On Sunday, February 19th, 2023 at 6:10 PM, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > > Hi Georgios, > > I spent some time looking at the patch again, and IMO it's RFC. But I > need some help with the commit messages - I updated 0001 and 0002 but I > wasn't quite sure what some of the stuff meant to say and/or it seemed > maybe coming from an earlier patch version and obsolete. Thank you very much Tomas! Indeed I have not being paying any attention at the commit messages. > Could you go over them and check if I got it right? Also feel free to > update the list of reviewers (I compiled that from substantial reviews > on the thread). Done. Rachel has been correctly identified as author in the relevant parts up to commit 98fe74218d. After that, she had offered review comments and I have taken the liberty to add her as a reviewer through out. Also I think that Shi Yu should be credited as a reviewer of 0003. > > The 0003 commit message seems somewhat confusing - I added some XXX > lines asking about unclear stuff. Please find in the attached v30 an updated message, as well as an amended reviewer list. Also v30 addresses the final comments raised by Justin. Cheers, //Georgios > regards > > -- > Tomas Vondra > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company
Вложения
Thanks for v30 with the updated commit messages. I've pushed 0001 after fixing a comment typo and removing (I think) an unnecessary change in an error message. I'll give the buildfarm a bit of time before pushing 0002 and 0003. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2/23/23 16:26, Tomas Vondra wrote: > Thanks for v30 with the updated commit messages. I've pushed 0001 after > fixing a comment typo and removing (I think) an unnecessary change in an > error message. > > I'll give the buildfarm a bit of time before pushing 0002 and 0003. > I've now pushed 0002 and 0003, after minor tweaks (a couple typos etc.), and marked the CF entry as committed. Thanks for the patch! I wonder how difficult would it be to add the zstd compression, so that we don't have the annoying "unsupported" cases. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Feb 23, 2023 at 09:24:46PM +0100, Tomas Vondra wrote: > I've now pushed 0002 and 0003, after minor tweaks (a couple typos etc.), > and marked the CF entry as committed. Thanks for the patch! A big thanks from me to everyone involved. > I wonder how difficult would it be to add the zstd compression, so that > we don't have the annoying "unsupported" cases. I'll send a patch soon. I first submitted patches for that 2 years ago (before PGDG was ready to add zstd). https://commitfest.postgresql.org/31/2888/ -- Justin
On Thu, Feb 23, 2023 at 07:51:16PM -0600, Justin Pryzby wrote: > On Thu, Feb 23, 2023 at 09:24:46PM +0100, Tomas Vondra wrote: >> I've now pushed 0002 and 0003, after minor tweaks (a couple typos etc.), >> and marked the CF entry as committed. Thanks for the patch! > > A big thanks from me to everyone involved. Wow, nice! The APIs are clear to follow. > I'll send a patch soon. I first submitted patches for that 2 years ago > (before PGDG was ready to add zstd). > https://commitfest.postgresql.org/31/2888/ Thanks. It should be straight-forward to see that in 16, I guess. -- Michael
Вложения
------- Original Message ------- On Friday, February 24th, 2023 at 5:35 AM, Michael Paquier <michael@paquier.xyz> wrote: > > > On Thu, Feb 23, 2023 at 07:51:16PM -0600, Justin Pryzby wrote: > > > On Thu, Feb 23, 2023 at 09:24:46PM +0100, Tomas Vondra wrote: > > > > > I've now pushed 0002 and 0003, after minor tweaks (a couple typos etc.), > > > and marked the CF entry as committed. Thanks for the patch! > > > > A big thanks from me to everyone involved. > > > Wow, nice! The APIs are clear to follow. I am out of words, thank you all so very much. I learned a lot. > > > I'll send a patch soon. I first submitted patches for that 2 years ago > > (before PGDG was ready to add zstd). > > https://commitfest.postgresql.org/31/2888/ > > > Thanks. It should be straight-forward to see that in 16, I guess. > -- > Michael
I have some fixes (attached) and questions while polishing the patch for zstd compression. The fixes are small and could be integrated with the patch for zstd, but could be applied independently. - I'm unclear about get_error_func(). That's called in three places from pg_backup_directory.c, after failures from write_func(), to supply an compression-specific error message to pg_fatal(). But it's not being used outside of directory format, nor for errors for other function pointers, or even for all errors in write_func(). Is there some reason why each compression method's write_func() shouldn't call pg_fatal() directly, with its compression-specific message ? - I still think supports_compression() should be renamed, or made into a static function in the necessary file. The main reason is that it's more clear what it indicates - whether compression is "implemented by pgdump" and not whether compression is "supported by this postgres build". It also seems possible that we'd want to add a function called something like supports_compression(), indicating whether the algorithm is supported by the current build. It'd be better if pgdump didn't subjugate that name. - Finally, the "Nothing to do in the default case" comment comes from Michael's commit 5e73a6048: + /* + * Custom and directory formats are compressed by default with gzip when + * available, not the others. + */ + if ((archiveFormat == archCustom || archiveFormat == archDirectory) && + !user_compression_defined) { #ifdef HAVE_LIBZ - if (archiveFormat == archCustom || archiveFormat == archDirectory) - compressLevel = Z_DEFAULT_COMPRESSION; - else + parse_compress_specification(PG_COMPRESSION_GZIP, NULL, + &compression_spec); +#else + /* Nothing to do in the default case */ #endif - compressLevel = 0; } As the comment says: for -Fc and -Fd, the compression is set to zlib, if enabled, and when not otherwise specified by the user. Before 5e73a6048, this set compressLevel=0 for -Fp and -Ft, *and* when zlib was unavailable. But I'm not sure why there's now an empty "#else". I also don't know what "the default case" refers to. Maybe the best thing here is to move the preprocessor #if, since it's no longer in the middle of a runtime conditional: #ifdef HAVE_LIBZ + if ((archiveFormat == archCustom || archiveFormat == archDirectory) && + !user_compression_defined) + parse_compress_specification(PG_COMPRESSION_GZIP, NULL, + &compression_spec); #endif ...but that elicits a warning about "variable set but not used"... -- Justin
Вложения
On Fri, Feb 24, 2023 at 11:02:14PM -0600, Justin Pryzby wrote: > I have some fixes (attached) and questions while polishing the patch for > zstd compression. The fixes are small and could be integrated with the > patch for zstd, but could be applied independently. One more - WriteDataToArchiveGzip() says: + if (cs->compression_spec.level == 0) + pg_fatal("requested to compress the archive yet no level was specified"); That was added at e9960732a. But if you specify gzip:0, the compression level is already enforced by validate_compress_specification(), before hitting gzip.c: | pg_dump: error: invalid compression specification: compression algorithm "gzip" expects a compression level between 1 and9 (default at -1) 5e73a6048 intended that to work as before, and you *can* specify -Z0: The change is backward-compatible, hence specifying only an integer leads to no compression for a level of 0 and gzip compression when the level is greater than 0. $ time ./src/bin/pg_dump/pg_dump -h /tmp regression -t int8_tbl -Fp --compress 0 |file - /dev/stdin: ASCII text Right now, I think that pg_fatal in gzip.c is dead code - that was first added in the patch version sent on 21 Dec 2022. -- Justin
Вложения
On 2/25/23 06:02, Justin Pryzby wrote: > I have some fixes (attached) and questions while polishing the patch for > zstd compression. The fixes are small and could be integrated with the > patch for zstd, but could be applied independently. > > - I'm unclear about get_error_func(). That's called in three places > from pg_backup_directory.c, after failures from write_func(), to > supply an compression-specific error message to pg_fatal(). But it's > not being used outside of directory format, nor for errors for other > function pointers, or even for all errors in write_func(). Is there > some reason why each compression method's write_func() shouldn't call > pg_fatal() directly, with its compression-specific message ? > I think there are a couple more places that might/should call get_error_func(). For example ahwrite() in pg_backup_archiver.c now simply does if (bytes_written != size * nmemb) WRITE_ERROR_EXIT; but perhaps it should call get_error_func() too. There are probably other places that call write_func() and should use get_error_func(). > - I still think supports_compression() should be renamed, or made into a > static function in the necessary file. The main reason is that it's > more clear what it indicates - whether compression is "implemented by > pgdump" and not whether compression is "supported by this postgres > build". It also seems possible that we'd want to add a function > called something like supports_compression(), indicating whether the > algorithm is supported by the current build. It'd be better if pgdump > didn't subjugate that name. > If we choose to rename this to have pgdump_ prefix, fine with me. But I don't think there's a realistic chance of conflict, as it's restricted to pgdump header etc. And it's not part of an API, so I guess we could rename that in the future if needed. > - Finally, the "Nothing to do in the default case" comment comes from > Michael's commit 5e73a6048: > > + /* > + * Custom and directory formats are compressed by default with gzip when > + * available, not the others. > + */ > + if ((archiveFormat == archCustom || archiveFormat == archDirectory) && > + !user_compression_defined) > { > #ifdef HAVE_LIBZ > - if (archiveFormat == archCustom || archiveFormat == archDirectory) > - compressLevel = Z_DEFAULT_COMPRESSION; > - else > + parse_compress_specification(PG_COMPRESSION_GZIP, NULL, > + &compression_spec); > +#else > + /* Nothing to do in the default case */ > #endif > - compressLevel = 0; > } > > > As the comment says: for -Fc and -Fd, the compression is set to zlib, if > enabled, and when not otherwise specified by the user. > > Before 5e73a6048, this set compressLevel=0 for -Fp and -Ft, *and* when > zlib was unavailable. > > But I'm not sure why there's now an empty "#else". I also don't know > what "the default case" refers to. > > Maybe the best thing here is to move the preprocessor #if, since it's no > longer in the middle of a runtime conditional: > > #ifdef HAVE_LIBZ > + if ((archiveFormat == archCustom || archiveFormat == archDirectory) && > + !user_compression_defined) > + parse_compress_specification(PG_COMPRESSION_GZIP, NULL, > + &compression_spec); > #endif > > ...but that elicits a warning about "variable set but not used"... > Not sure, I need to think about this a bit. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sat, Feb 25, 2023 at 08:05:53AM -0600, Justin Pryzby wrote: > On Fri, Feb 24, 2023 at 11:02:14PM -0600, Justin Pryzby wrote: > > I have some fixes (attached) and questions while polishing the patch for > > zstd compression. The fixes are small and could be integrated with the > > patch for zstd, but could be applied independently. > > One more - WriteDataToArchiveGzip() says: One more again. The LZ4 path is using non-streaming mode, which compresses each block without persistent state, giving poor compression for -Fc compared with -Fp. If the data is highly compressible, the difference can be orders of magnitude. $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -Z lz4 -Fp |wc -c 12351763 $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -Z lz4 -Fc |wc -c 21890708 That's not true for gzip: $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Z gzip -Fc |wc -c 2118869 $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Z gzip -Fp |wc -c 2115832 The function ought to at least use streaming mode, so each block/row isn't compressioned in isolation. 003 is a simple patch to use streaming mode, which improves the -Fc case: $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -Z lz4 -Fc |wc -c 15178283 However, that still flushes the compression buffer, writing a block header, for every row. With a single-column table, pg_dump -Fc -Z lz4 still outputs ~10% *more* data than with no compression at all. And that's for compressible data. $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Fc -Z lz4 |wc -c 12890296 $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Fc -Z none |wc -c 11890296 I think this should use the LZ4F API with frames, which are buffered to avoid outputting a header for every single row. The LZ4F format isn't compatible with the LZ4 format, so (unlike changing to the streaming API) that's not something we can change in a bugfix release. I consider this an Opened Item. With the LZ4F API in 004, -Fp and -Fc are essentially the same size (like gzip). (Oh, and the output is three times smaller, too.) $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Z lz4 -Fp |wc -c 4155448 $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Z lz4 -Fc |wc -c 4156548 -- Justin
Вложения
------- Original Message ------- On Sunday, February 26th, 2023 at 3:59 PM, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > > > > On 2/25/23 06:02, Justin Pryzby wrote: > > > I have some fixes (attached) and questions while polishing the patch for > > zstd compression. The fixes are small and could be integrated with the > > patch for zstd, but could be applied independently. > > > > - I'm unclear about get_error_func(). That's called in three places > > from pg_backup_directory.c, after failures from write_func(), to > > supply an compression-specific error message to pg_fatal(). But it's > > not being used outside of directory format, nor for errors for other > > function pointers, or even for all errors in write_func(). Is there > > some reason why each compression method's write_func() shouldn't call > > pg_fatal() directly, with its compression-specific message ? > > > I think there are a couple more places that might/should call > get_error_func(). For example ahwrite() in pg_backup_archiver.c now > simply does > > if (bytes_written != size * nmemb) > WRITE_ERROR_EXIT; > > but perhaps it should call get_error_func() too. There are probably > other places that call write_func() and should use get_error_func(). Agreed, calling get_error_func() would be preferable to a fatal error. It should be the caller of the api who decides how to proceed. > > > - I still think supports_compression() should be renamed, or made into a > > static function in the necessary file. The main reason is that it's > > more clear what it indicates - whether compression is "implemented by > > pgdump" and not whether compression is "supported by this postgres > > build". It also seems possible that we'd want to add a function > > called something like supports_compression(), indicating whether the > > algorithm is supported by the current build. It'd be better if pgdump > > didn't subjugate that name. > > > If we choose to rename this to have pgdump_ prefix, fine with me. But I > don't think there's a realistic chance of conflict, as it's restricted > to pgdump header etc. And it's not part of an API, so I guess we could > rename that in the future if needed. Agreed, it is very unrealistic that one will include that header file anywhere but within pg_dump. Also. I think that adding a prefix, "pgdump", "pg_dump", or similar does not add value and subtracts readability. > > > - Finally, the "Nothing to do in the default case" comment comes from > > Michael's commit 5e73a6048: > > > > + /* > > + * Custom and directory formats are compressed by default with gzip when > > + * available, not the others. > > + / > > + if ((archiveFormat == archCustom || archiveFormat == archDirectory) && > > + !user_compression_defined) > > { > > #ifdef HAVE_LIBZ > > - if (archiveFormat == archCustom || archiveFormat == archDirectory) > > - compressLevel = Z_DEFAULT_COMPRESSION; > > - else > > + parse_compress_specification(PG_COMPRESSION_GZIP, NULL, > > + &compression_spec); > > +#else > > + / Nothing to do in the default case */ > > #endif > > - compressLevel = 0; > > } > > > > As the comment says: for -Fc and -Fd, the compression is set to zlib, if > > enabled, and when not otherwise specified by the user. > > > > Before 5e73a6048, this set compressLevel=0 for -Fp and -Ft, and when > > zlib was unavailable. > > > > But I'm not sure why there's now an empty "#else". I also don't know > > what "the default case" refers to. > > > > Maybe the best thing here is to move the preprocessor #if, since it's no > > longer in the middle of a runtime conditional: > > > > #ifdef HAVE_LIBZ > > + if ((archiveFormat == archCustom || archiveFormat == archDirectory) && > > + !user_compression_defined) > > + parse_compress_specification(PG_COMPRESSION_GZIP, NULL, > > + &compression_spec); > > #endif > > > > ...but that elicits a warning about "variable set but not used"... > > > Not sure, I need to think about this a bit. Not having warnings is preferable, isn't it? I can understand the confusion on the message though. Maybe a phrasing like: /* Nothing to do for the default case when LIBZ is not available */ is easier to understand. Cheers, //Georgios > > regards > > -- > Tomas Vondra > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company
------- Original Message ------- On Saturday, February 25th, 2023 at 3:05 PM, Justin Pryzby <pryzby@telsasoft.com> wrote: > > > On Fri, Feb 24, 2023 at 11:02:14PM -0600, Justin Pryzby wrote: > > > I have some fixes (attached) and questions while polishing the patch for > > zstd compression. The fixes are small and could be integrated with the > > patch for zstd, but could be applied independently. Please find some comments on the rest of the fixes patch that Tomas has not commented on. can be compressed with the <application>gzip</application> or - <application>lz4</application>tool. + <application>lz4</application> tools. +1 The compression method can be set to <literal>gzip</literal> or - <literal>lz4</literal> or <literal>none</literal> for no compression. + <literal>lz4</literal>, or <literal>none</literal> for no compression. I am not a native English speaker. Yet I think that if one adds commas in one of the options, then one should add commas to all the options. Namely, the aboveis missing a comma between gzip and lz4. However I think that not having any commas still works grammatically and syntactically. - /* - * A level of zero simply copies the input one block at the time. This - * is probably not what the user wanted when calling this interface. - */ - if (cs->compression_spec.level == 0) - pg_fatal("requested to compress the archive yet no level was specified"); I disagree with change. WriteDataToArchiveGzip() is far away from what ever the code in pg_dump.c is doing. Any non valid values for level will emit an error in when the proper gzip/zlib code is called. A zero value however, will not emit such error. Having the extra check there is a future proof guarantee in a very low cost. Furthermore, it quickly informs the reader of the code about that specific value helping with readability and comprehension. If any change is required, something for which I vote strongly against, I would at least recommend to replace it with an assertion. - * Initialize a compress file stream. Deffer the compression algorithm + * Initialize a compress file stream. Infer the compression algorithm :+1: - # Skip command-level tests for gzip if there is no support for it. + # Skip command-level tests for gzip/lz4 if they're not supported. We will be back at that again soon. Maybe change to: Skip command-level test for unsupported compression methods To include everything. - ($pgdump_runs{$run}->{compile_option} eq 'gzip' && !$supports_gzip) || - ($pgdump_runs{$run}->{compile_option} eq 'lz4' && !$supports_lz4)) + (($pgdump_runs{$run}->{compile_option} eq 'gzip' && !$supports_gzip) || + ($pgdump_runs{$run}->{compile_option} eq 'lz4' && !$supports_lz4))) Good catch, :+1: Cheers, //Georgios > -- > Justin
On Thu, Feb 23, 2023 at 09:24:46PM +0100, Tomas Vondra wrote: > On 2/23/23 16:26, Tomas Vondra wrote: > > Thanks for v30 with the updated commit messages. I've pushed 0001 after > > fixing a comment typo and removing (I think) an unnecessary change in an > > error message. > > > > I'll give the buildfarm a bit of time before pushing 0002 and 0003. > > > > I've now pushed 0002 and 0003, after minor tweaks (a couple typos etc.), > and marked the CF entry as committed. Thanks for the patch! I found that e9960732a broke writing of empty gzip-compressed data, specifically LOs. pg_dump succeeds, but then the restore fails: postgres=# SELECT lo_create(1234); lo_create | 1234 $ time ./src/bin/pg_dump/pg_dump -h /tmp -d postgres -Fc |./src/bin/pg_dump/pg_restore -f /dev/null -v pg_restore: implied data-only restore pg_restore: executing BLOB 1234 pg_restore: processing BLOBS pg_restore: restoring large object with OID 1234 pg_restore: error: could not uncompress data: (null) The inline patch below fixes it, but you won't be able to apply it directly, as it's on top of other patches which rename the functions back to "Zlib" and rearranges the functions to their original order, to allow running: git diff --diff-algorithm=minimal -w e9960732a~:./src/bin/pg_dump/compress_io.c ./src/bin/pg_dump/compress_gzip.c The current function order avoids 3 lines of declarations, but it's obviously pretty useful to be able to run that diff command. I already argued for not calling the functions "Gzip" on the grounds that the name was inaccurate. I'd want to create an empty large object in src/test/sql/largeobject.sql to exercise this tested during pgupgrade. But unfortunately that doesn't use -Fc, so this isn't hit. Empty input is an important enough test case to justify a tap test, if there's no better way. diff --git a/src/bin/pg_dump/compress_gzip.c b/src/bin/pg_dump/compress_gzip.c index f3f5e87c9a8..68f3111b2fe 100644 --- a/src/bin/pg_dump/compress_gzip.c +++ b/src/bin/pg_dump/compress_gzip.c @@ -55,6 +55,32 @@ InitCompressorZlib(CompressorState *cs, gzipcs = (ZlibCompressorState *) pg_malloc0(sizeof(ZlibCompressorState)); cs->private_data = gzipcs; + + if (cs->writeF) + { + z_streamp zp; + zp = gzipcs->zp = (z_streamp) pg_malloc0(sizeof(z_stream)); + zp->zalloc = Z_NULL; + zp->zfree = Z_NULL; + zp->opaque = Z_NULL; + + /* + * outsize is the buffer size we tell zlib it can output to. We + * actually allocate one extra byte because some routines want to append a + * trailing zero byte to the zlib output. + */ + + gzipcs->outbuf = pg_malloc(ZLIB_OUT_SIZE + 1); + gzipcs->outsize = ZLIB_OUT_SIZE; + + if (deflateInit(gzipcs->zp, cs->compression_spec.level) != Z_OK) + pg_fatal("could not initialize compression library: %s", + zp->msg); + + /* Just be paranoid - maybe End is called after Start, with no Write */ + zp->next_out = gzipcs->outbuf; + zp->avail_out = gzipcs->outsize; + } } static void @@ -63,7 +89,7 @@ EndCompressorZlib(ArchiveHandle *AH, CompressorState *cs) ZlibCompressorState *gzipcs = (ZlibCompressorState *) cs->private_data; z_streamp zp; - if (gzipcs->zp) + if (cs->writeF != NULL) { zp = gzipcs->zp; zp->next_in = NULL; @@ -131,29 +157,6 @@ WriteDataToArchiveZlib(ArchiveHandle *AH, CompressorState *cs, const void *data, size_t dLen) { ZlibCompressorState *gzipcs = (ZlibCompressorState *) cs->private_data; - z_streamp zp; - - if (!gzipcs->zp) - { - zp = gzipcs->zp = (z_streamp) pg_malloc(sizeof(z_stream)); - zp->zalloc = Z_NULL; - zp->zfree = Z_NULL; - zp->opaque = Z_NULL; - - /* - * outsize is the buffer size we tell zlib it can output to. We - * actually allocate one extra byte because some routines want to - * append a trailing zero byte to the zlib output. - */ - gzipcs->outbuf = pg_malloc(ZLIB_OUT_SIZE + 1); - gzipcs->outsize = ZLIB_OUT_SIZE; - - if (deflateInit(zp, cs->compression_spec.level) != Z_OK) - pg_fatal("could not initialize compression library: %s", zp->msg); - - zp->next_out = gzipcs->outbuf; - zp->avail_out = gzipcs->outsize; - } gzipcs->zp->next_in = (void *) unconstify(void *, data); gzipcs->zp->avail_in = dLen;
Вложения
On Tue, Feb 28, 2023 at 05:58:34PM -0600, Justin Pryzby wrote: > I found that e9960732a broke writing of empty gzip-compressed data, > specifically LOs. pg_dump succeeds, but then the restore fails: The number of issues you have been reporting here begins to worries me.. How many of them have you found? Is it right to assume that all of them have basically 03d02f5 as oldest origin point? -- Michael
Вложения
------- Original Message ------- On Wednesday, March 1st, 2023 at 12:58 AM, Justin Pryzby <pryzby@telsasoft.com> wrote: > I found that e9960732a broke writing of empty gzip-compressed data, > specifically LOs. pg_dump succeeds, but then the restore fails: > > postgres=# SELECT lo_create(1234); > lo_create | 1234 > > $ time ./src/bin/pg_dump/pg_dump -h /tmp -d postgres -Fc |./src/bin/pg_dump/pg_restore -f /dev/null -v > pg_restore: implied data-only restore > pg_restore: executing BLOB 1234 > pg_restore: processing BLOBS > pg_restore: restoring large object with OID 1234 > pg_restore: error: could not uncompress data: (null) > Thank you for looking. This was an untested case. > The inline patch below fixes it, but you won't be able to apply it > directly, as it's on top of other patches which rename the functions > back to "Zlib" and rearranges the functions to their original order, to > allow running: > > git diff --diff-algorithm=minimal -w e9960732a~:./src/bin/pg_dump/compress_io.c ./src/bin/pg_dump/compress_gzip.c > Please find a patch attached that can be applied directly. > The current function order avoids 3 lines of declarations, but it's > obviously pretty useful to be able to run that diff command. I already > argued for not calling the functions "Gzip" on the grounds that the name > was inaccurate. I have no idea why we are back on the naming issue. I stand by the name because in my humble opinion helps the code reader. There is a certain uniformity when the compression_spec.algorithm and the compressor functions match as the following code sample shows. if (compression_spec.algorithm == PG_COMPRESSION_NONE) InitCompressorNone(cs, compression_spec); else if (compression_spec.algorithm == PG_COMPRESSION_GZIP) InitCompressorGzip(cs, compression_spec); else if (compression_spec.algorithm == PG_COMPRESSION_LZ4) InitCompressorLZ4(cs, compression_spec); When the reader wants to see what happens when the PG_COMPRESSION_XXX is set, has to simply search for the XXX part. I think that this is justification enough for the use of the names. > > I'd want to create an empty large object in src/test/sql/largeobject.sql > to exercise this tested during pgupgrade. But unfortunately that > doesn't use -Fc, so this isn't hit. Empty input is an important enough > test case to justify a tap test, if there's no better way. Please find in the attached a test case that exercises this codepath. Cheers, //Georgios
Вложения
On 3/1/23 08:24, Michael Paquier wrote: > On Tue, Feb 28, 2023 at 05:58:34PM -0600, Justin Pryzby wrote: >> I found that e9960732a broke writing of empty gzip-compressed data, >> specifically LOs. pg_dump succeeds, but then the restore fails: > > The number of issues you have been reporting here begins to worries > me.. How many of them have you found? Is it right to assume that all > of them have basically 03d02f5 as oldest origin point? AFAICS a lot of the issues are more a discussion about wording in a couple places, whether it's nicer to do A or B, name the functions differently or what. I'm aware of three genuine issues that I intend to fix shortly: 1) incorrect "if" condition in a TAP test 2) failure when compressing empty LO (which we had no test for) 3) change in handling "compression level = 0" (which I believe should be made to behave like before) regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 3/1/23 14:39, gkokolatos@pm.me wrote: > > > > > > ------- Original Message ------- > On Wednesday, March 1st, 2023 at 12:58 AM, Justin Pryzby <pryzby@telsasoft.com> wrote: > > > >> I found that e9960732a broke writing of empty gzip-compressed data, >> specifically LOs. pg_dump succeeds, but then the restore fails: >> >> postgres=# SELECT lo_create(1234); >> lo_create | 1234 >> >> $ time ./src/bin/pg_dump/pg_dump -h /tmp -d postgres -Fc |./src/bin/pg_dump/pg_restore -f /dev/null -v >> pg_restore: implied data-only restore >> pg_restore: executing BLOB 1234 >> pg_restore: processing BLOBS >> pg_restore: restoring large object with OID 1234 >> pg_restore: error: could not uncompress data: (null) >> > > Thank you for looking. This was an untested case. > Yeah :-( >> The inline patch below fixes it, but you won't be able to apply it >> directly, as it's on top of other patches which rename the functions >> back to "Zlib" and rearranges the functions to their original order, to >> allow running: >> >> git diff --diff-algorithm=minimal -w e9960732a~:./src/bin/pg_dump/compress_io.c ./src/bin/pg_dump/compress_gzip.c >> > > Please find a patch attached that can be applied directly. > >> The current function order avoids 3 lines of declarations, but it's >> obviously pretty useful to be able to run that diff command. I already >> argued for not calling the functions "Gzip" on the grounds that the name >> was inaccurate. > > I have no idea why we are back on the naming issue. I stand by the name > because in my humble opinion helps the code reader. There is a certain > uniformity when the compression_spec.algorithm and the compressor > functions match as the following code sample shows. > > if (compression_spec.algorithm == PG_COMPRESSION_NONE) > InitCompressorNone(cs, compression_spec); > else if (compression_spec.algorithm == PG_COMPRESSION_GZIP) > InitCompressorGzip(cs, compression_spec); > else if (compression_spec.algorithm == PG_COMPRESSION_LZ4) > InitCompressorLZ4(cs, compression_spec); > > When the reader wants to see what happens when the PG_COMPRESSION_XXX > is set, has to simply search for the XXX part. I think that this is > justification enough for the use of the names. > I don't recall the previous discussion about the naming, but I'm not sure why would it be inaccurate. We call it 'gzip' pretty much everywhere, and I agree with Georgios there's it helps to make this consistent with the PG_COMPRESSION_ stuff. The one thing that concerned me while reviewing it earlier was that it might make the backpatcheing harder. But that's mostly irrelevant due to all the other changes I think. >> >> I'd want to create an empty large object in src/test/sql/largeobject.sql >> to exercise this tested during pgupgrade. But unfortunately that >> doesn't use -Fc, so this isn't hit. Empty input is an important enough >> test case to justify a tap test, if there's no better way. > > Please find in the attached a test case that exercises this codepath. > Thanks. That seems correct to me, but I find it somewhat confusing, because we now have DeflateCompressorInit vs. InitCompressorGzip DeflateCompressorEnd vs. EndCompressorGzip DeflateCompressorData - The name doesn't really say what it does (would be better to have a verb in there, I think). I wonder if we can make this somehow clearer? Also, InitCompressorGzip says this: /* * If the caller has defined a write function, prepare the necessary * state. Avoid initializing during the first write call, because End * may be called without ever writing any data. */ if (cs->writeF) DeflateCompressorInit(cs); Does it actually make sense to not have writeF defined in some cases? regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2/27/23 15:56, gkokolatos@pm.me wrote: > > > > > > ------- Original Message ------- > On Saturday, February 25th, 2023 at 3:05 PM, Justin Pryzby <pryzby@telsasoft.com> wrote: > > >> >> >> On Fri, Feb 24, 2023 at 11:02:14PM -0600, Justin Pryzby wrote: >> >>> I have some fixes (attached) and questions while polishing the patch for >>> zstd compression. The fixes are small and could be integrated with the >>> patch for zstd, but could be applied independently. > > > Please find some comments on the rest of the fixes patch that Tomas has not > commented on. > > can be compressed with the <application>gzip</application> or > - <application>lz4</application>tool. > + <application>lz4</application> tools. > > +1 > > The compression method can be set to <literal>gzip</literal> or > - <literal>lz4</literal> or <literal>none</literal> for no compression. > + <literal>lz4</literal>, or <literal>none</literal> for no compression. > > I am not a native English speaker. Yet I think that if one adds commas > in one of the options, then one should add commas to all the options. > Namely, the aboveis missing a comma between gzip and lz4. However I > think that not having any commas still works grammatically and > syntactically. > I pushed a fix with most of these wording changes. As for this comma, I believe the correct style is a, b, or c At least that's what the other places in the pg_dump.sgml file do. > - ($pgdump_runs{$run}->{compile_option} eq 'gzip' && !$supports_gzip) || > - ($pgdump_runs{$run}->{compile_option} eq 'lz4' && !$supports_lz4)) > + (($pgdump_runs{$run}->{compile_option} eq 'gzip' && !$supports_gzip) || > + ($pgdump_runs{$run}->{compile_option} eq 'lz4' && !$supports_lz4))) > Pushed a fix for this too. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2/25/23 15:05, Justin Pryzby wrote: > On Fri, Feb 24, 2023 at 11:02:14PM -0600, Justin Pryzby wrote: >> I have some fixes (attached) and questions while polishing the patch for >> zstd compression. The fixes are small and could be integrated with the >> patch for zstd, but could be applied independently. > > One more - WriteDataToArchiveGzip() says: > > + if (cs->compression_spec.level == 0) > + pg_fatal("requested to compress the archive yet no level was specified"); > > That was added at e9960732a. > > But if you specify gzip:0, the compression level is already enforced by > validate_compress_specification(), before hitting gzip.c: > > | pg_dump: error: invalid compression specification: compression algorithm "gzip" expects a compression level between 1and 9 (default at -1) > > 5e73a6048 intended that to work as before, and you *can* specify -Z0: > > The change is backward-compatible, hence specifying only an integer > leads to no compression for a level of 0 and gzip compression when the > level is greater than 0. > > $ time ./src/bin/pg_dump/pg_dump -h /tmp regression -t int8_tbl -Fp --compress 0 |file - > /dev/stdin: ASCII text > FWIW I agree we should make this backwards-compatible - accept "0" and treat it as no compression. Georgios, can you prepare a patch doing that? regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2/27/23 05:49, Justin Pryzby wrote: > On Sat, Feb 25, 2023 at 08:05:53AM -0600, Justin Pryzby wrote: >> On Fri, Feb 24, 2023 at 11:02:14PM -0600, Justin Pryzby wrote: >>> I have some fixes (attached) and questions while polishing the patch for >>> zstd compression. The fixes are small and could be integrated with the >>> patch for zstd, but could be applied independently. >> >> One more - WriteDataToArchiveGzip() says: > > One more again. > > The LZ4 path is using non-streaming mode, which compresses each block > without persistent state, giving poor compression for -Fc compared with > -Fp. If the data is highly compressible, the difference can be orders > of magnitude. > > $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -Z lz4 -Fp |wc -c > 12351763 > $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -Z lz4 -Fc |wc -c > 21890708 > > That's not true for gzip: > > $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Z gzip -Fc |wc -c > 2118869 > $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Z gzip -Fp |wc -c > 2115832 > > The function ought to at least use streaming mode, so each block/row > isn't compressioned in isolation. 003 is a simple patch to use > streaming mode, which improves the -Fc case: > > $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -Z lz4 -Fc |wc -c > 15178283 > > However, that still flushes the compression buffer, writing a block > header, for every row. With a single-column table, pg_dump -Fc -Z lz4 > still outputs ~10% *more* data than with no compression at all. And > that's for compressible data. > > $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Fc -Z lz4 |wc -c > 12890296 > $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Fc -Z none |wc -c > 11890296 > > I think this should use the LZ4F API with frames, which are buffered to > avoid outputting a header for every single row. The LZ4F format isn't > compatible with the LZ4 format, so (unlike changing to the streaming > API) that's not something we can change in a bugfix release. I consider > this an Opened Item. > > With the LZ4F API in 004, -Fp and -Fc are essentially the same size > (like gzip). (Oh, and the output is three times smaller, too.) > > $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Z lz4 -Fp |wc -c > 4155448 > $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Z lz4 -Fc |wc -c > 4156548 > Thanks. Those are definitely interesting improvements/optimizations! I suggest we track them as a separate patch series - please add them to the CF app (I guess you'll have to add them to 2023-07 at this point, but we can get them in, I think). regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Mar 01, 2023 at 01:39:14PM +0000, gkokolatos@pm.me wrote: > On Wednesday, March 1st, 2023 at 12:58 AM, Justin Pryzby <pryzby@telsasoft.com> wrote: > > > The current function order avoids 3 lines of declarations, but it's > > obviously pretty useful to be able to run that diff command. I already > > argued for not calling the functions "Gzip" on the grounds that the name > > was inaccurate. > > I have no idea why we are back on the naming issue. I stand by the name > because in my humble opinion helps the code reader. There is a certain > uniformity when the compression_spec.algorithm and the compressor > functions match as the following code sample shows. I mentioned that it's because this allows usefully running "diff" against the previous commits. > if (compression_spec.algorithm == PG_COMPRESSION_NONE) > InitCompressorNone(cs, compression_spec); > else if (compression_spec.algorithm == PG_COMPRESSION_GZIP) > InitCompressorGzip(cs, compression_spec); > else if (compression_spec.algorithm == PG_COMPRESSION_LZ4) > InitCompressorLZ4(cs, compression_spec); > > When the reader wants to see what happens when the PG_COMPRESSION_XXX > is set, has to simply search for the XXX part. I think that this is > justification enough for the use of the names. You're right about that. But (with the exception of InitCompressorGzip), I'm referring to the naming of internal functions, static to gzip.c, so renaming can't be said to cause a loss of clarity. > > I'd want to create an empty large object in src/test/sql/largeobject.sql > > to exercise this tested during pgupgrade. But unfortunately that > > doesn't use -Fc, so this isn't hit. Empty input is an important enough > > test case to justify a tap test, if there's no better way. > > Please find in the attached a test case that exercises this codepath. Thanks for writing it. This patch could be an opportunity to improve the "diff" output, without renaming anything. The old order of functions was: -InitCompressorZlib -EndCompressorZlib -DeflateCompressorZlib -WriteDataToArchiveZlib -ReadDataFromArchiveZlib If you put DeflateCompressorEnd immediately after DeflateCompressorInit, diff works nicely. You'll have to add at least one declaration, which seems very worth it. -- Justin
------- Original Message ------- On Wednesday, March 1st, 2023 at 5:20 PM, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > > > > On 2/25/23 15:05, Justin Pryzby wrote: > > > On Fri, Feb 24, 2023 at 11:02:14PM -0600, Justin Pryzby wrote: > > > > > I have some fixes (attached) and questions while polishing the patch for > > > zstd compression. The fixes are small and could be integrated with the > > > patch for zstd, but could be applied independently. > > > > One more - WriteDataToArchiveGzip() says: > > > > + if (cs->compression_spec.level == 0) > > + pg_fatal("requested to compress the archive yet no level was specified"); > > > > That was added at e9960732a. > > > > But if you specify gzip:0, the compression level is already enforced by > > validate_compress_specification(), before hitting gzip.c: > > > > | pg_dump: error: invalid compression specification: compression algorithm "gzip" expects a compression level between1 and 9 (default at -1) > > > > 5e73a6048 intended that to work as before, and you can specify -Z0: > > > > The change is backward-compatible, hence specifying only an integer > > leads to no compression for a level of 0 and gzip compression when the > > level is greater than 0. > > > > $ time ./src/bin/pg_dump/pg_dump -h /tmp regression -t int8_tbl -Fp --compress 0 |file - > > /dev/stdin: ASCII text > > > FWIW I agree we should make this backwards-compatible - accept "0" and > treat it as no compression. > > Georgios, can you prepare a patch doing that? Please find a patch attached. However I am a bit at a loss, the backwards compatible behaviour has not changed. Passing -Z0/--compress=0 does produce a non compressed output. So I am not really certain as to what broke and needs fixing. What commit 5e73a6048 did fail to do, is test the backwards compatible behaviour. The attached amends it. Cheers, //Georgios > > > regards > -- > Tomas Vondra > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company
Вложения
On Wed, Mar 01, 2023 at 05:20:05PM +0100, Tomas Vondra wrote: > On 2/25/23 15:05, Justin Pryzby wrote: > > On Fri, Feb 24, 2023 at 11:02:14PM -0600, Justin Pryzby wrote: > >> I have some fixes (attached) and questions while polishing the patch for > >> zstd compression. The fixes are small and could be integrated with the > >> patch for zstd, but could be applied independently. > > > > One more - WriteDataToArchiveGzip() says: > > > > + if (cs->compression_spec.level == 0) > > + pg_fatal("requested to compress the archive yet no level was specified"); > > > > That was added at e9960732a. > > > > But if you specify gzip:0, the compression level is already enforced by > > validate_compress_specification(), before hitting gzip.c: > > > > | pg_dump: error: invalid compression specification: compression algorithm "gzip" expects a compression level between1 and 9 (default at -1) > > > > 5e73a6048 intended that to work as before, and you *can* specify -Z0: > > > > The change is backward-compatible, hence specifying only an integer > > leads to no compression for a level of 0 and gzip compression when the > > level is greater than 0. > > > > $ time ./src/bin/pg_dump/pg_dump -h /tmp regression -t int8_tbl -Fp --compress 0 |file - > > /dev/stdin: ASCII text > > FWIW I agree we should make this backwards-compatible - accept "0" and > treat it as no compression. > > Georgios, can you prepare a patch doing that? I think maybe Tomas misunderstood. What I was trying to say is that -Z 0 *is* accepted to mean no compression. This part wasn't quoted, but I said: > Right now, I think that pg_fatal in gzip.c is dead code - that was first > added in the patch version sent on 21 Dec 2022. If you run the diff command that I've been talking about, you'll see that InitCompressorZlib was almost unchanged - e9960732 is essentially a refactoring. I don't think it's desirable to add a pg_fatal() in a function that's otherwise nearly-unchanged. The fact that it's nearly-unchanged is a good thing: it simplifies reading of what changed. If someone wants to add a pg_fatal() in that code path, it'd be better done in its own commit, with a separate message explaining the change. If you insist on changing anything here, you might add an assertion (as you said earlier) along with a comment like /* -Z 0 uses the "None" compressor rather than zlib with no compression */ -- Justin
On 3/2/23 18:18, Justin Pryzby wrote: > On Wed, Mar 01, 2023 at 05:20:05PM +0100, Tomas Vondra wrote: >> On 2/25/23 15:05, Justin Pryzby wrote: >>> On Fri, Feb 24, 2023 at 11:02:14PM -0600, Justin Pryzby wrote: >>>> I have some fixes (attached) and questions while polishing the patch for >>>> zstd compression. The fixes are small and could be integrated with the >>>> patch for zstd, but could be applied independently. >>> >>> One more - WriteDataToArchiveGzip() says: >>> >>> + if (cs->compression_spec.level == 0) >>> + pg_fatal("requested to compress the archive yet no level was specified"); >>> >>> That was added at e9960732a. >>> >>> But if you specify gzip:0, the compression level is already enforced by >>> validate_compress_specification(), before hitting gzip.c: >>> >>> | pg_dump: error: invalid compression specification: compression algorithm "gzip" expects a compression level between1 and 9 (default at -1) >>> >>> 5e73a6048 intended that to work as before, and you *can* specify -Z0: >>> >>> The change is backward-compatible, hence specifying only an integer >>> leads to no compression for a level of 0 and gzip compression when the >>> level is greater than 0. >>> >>> $ time ./src/bin/pg_dump/pg_dump -h /tmp regression -t int8_tbl -Fp --compress 0 |file - >>> /dev/stdin: ASCII text >> >> FWIW I agree we should make this backwards-compatible - accept "0" and >> treat it as no compression. >> >> Georgios, can you prepare a patch doing that? > > I think maybe Tomas misunderstood. What I was trying to say is that -Z > 0 *is* accepted to mean no compression. This part wasn't quoted, but I > said: > Ah, I see. Well, I also tried but with "-Z gzip:0" (and not -Z 0), and that does fail: error: invalid compression specification: compression algorithm "gzip" expects a compression level between 1 and 9 (default at -1) It's a bit weird these two cases behave differently, when both translate to the same default compression method (gzip). >> Right now, I think that pg_fatal in gzip.c is dead code - that was first >> added in the patch version sent on 21 Dec 2022. > > If you run the diff command that I've been talking about, you'll see > that InitCompressorZlib was almost unchanged - e9960732 is essentially a > refactoring. I don't think it's desirable to add a pg_fatal() in a > function that's otherwise nearly-unchanged. The fact that it's > nearly-unchanged is a good thing: it simplifies reading of what changed. > If someone wants to add a pg_fatal() in that code path, it'd be better > done in its own commit, with a separate message explaining the change. > > If you insist on changing anything here, you might add an assertion (as > you said earlier) along with a comment like > /* -Z 0 uses the "None" compressor rather than zlib with no compression */ > Yeah, a comment would be helpful. Also, after thinking about it a bit more maybe having the unreachable pg_fatal() is not a good thing, as it will just confuse people (I'd certainly assume having such check means there's a way in which case it might trigger.). Maybe an assert would be better? regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Mar 01, 2023 at 04:52:49PM +0100, Tomas Vondra wrote: > Thanks. That seems correct to me, but I find it somewhat confusing, > because we now have > > DeflateCompressorInit vs. InitCompressorGzip > > DeflateCompressorEnd vs. EndCompressorGzip > > DeflateCompressorData - The name doesn't really say what it does (would > be better to have a verb in there, I think). > > I wonder if we can make this somehow clearer? To move things along, I updated Georgios' patch: Rename DeflateCompressorData() to DeflateCompressorCommon(); Rearrange functions to their original order allowing a cleaner diff to the prior code; Change pg_fatal() to an assertion+comment; Update the commit message and fix a few typos; > Also, InitCompressorGzip says this: > > /* > * If the caller has defined a write function, prepare the necessary > * state. Avoid initializing during the first write call, because End > * may be called without ever writing any data. > */ > if (cs->writeF) > DeflateCompressorInit(cs); > > Does it actually make sense to not have writeF defined in some cases? InitCompressor is being called for either reading or writing, either of which could be null: src/bin/pg_dump/pg_backup_custom.c: ctx->cs = AllocateCompressor(AH->compression_spec, src/bin/pg_dump/pg_backup_custom.c- NULL, src/bin/pg_dump/pg_backup_custom.c- _CustomWriteFunc); -- src/bin/pg_dump/pg_backup_custom.c: cs = AllocateCompressor(AH->compression_spec, src/bin/pg_dump/pg_backup_custom.c- _CustomReadFunc, NULL); It's confusing that the comment says "Avoid initializing...". What it really means is "Initialize eagerly...". But that makes more sense in the context of the commit message for this bugfix than in a comment. So I changed that too. + /* If deflation was initialized, finalize it */ + if (cs->private_data) + DeflateCompressorEnd(AH, cs); Maybe it'd be more clear if this used "if (cs->writeF)", like in the init function ? -- Justin
Вложения
On Wed, Mar 01, 2023 at 05:39:54PM +0100, Tomas Vondra wrote: > On 2/27/23 05:49, Justin Pryzby wrote: > > On Sat, Feb 25, 2023 at 08:05:53AM -0600, Justin Pryzby wrote: > >> On Fri, Feb 24, 2023 at 11:02:14PM -0600, Justin Pryzby wrote: > >>> I have some fixes (attached) and questions while polishing the patch for > >>> zstd compression. The fixes are small and could be integrated with the > >>> patch for zstd, but could be applied independently. > >> > >> One more - WriteDataToArchiveGzip() says: > > > > One more again. > > > > The LZ4 path is using non-streaming mode, which compresses each block > > without persistent state, giving poor compression for -Fc compared with > > -Fp. If the data is highly compressible, the difference can be orders > > of magnitude. > > > > $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -Z lz4 -Fp |wc -c > > 12351763 > > $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -Z lz4 -Fc |wc -c > > 21890708 > > > > That's not true for gzip: > > > > $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Z gzip -Fc |wc -c > > 2118869 > > $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Z gzip -Fp |wc -c > > 2115832 > > > > The function ought to at least use streaming mode, so each block/row > > isn't compressioned in isolation. 003 is a simple patch to use > > streaming mode, which improves the -Fc case: > > > > $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -Z lz4 -Fc |wc -c > > 15178283 > > > > However, that still flushes the compression buffer, writing a block > > header, for every row. With a single-column table, pg_dump -Fc -Z lz4 > > still outputs ~10% *more* data than with no compression at all. And > > that's for compressible data. > > > > $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Fc -Z lz4 |wc -c > > 12890296 > > $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Fc -Z none |wc -c > > 11890296 > > > > I think this should use the LZ4F API with frames, which are buffered to > > avoid outputting a header for every single row. The LZ4F format isn't > > compatible with the LZ4 format, so (unlike changing to the streaming > > API) that's not something we can change in a bugfix release. I consider > > this an Opened Item. > > > > With the LZ4F API in 004, -Fp and -Fc are essentially the same size > > (like gzip). (Oh, and the output is three times smaller, too.) > > > > $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Z lz4 -Fp |wc -c > > 4155448 > > $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Z lz4 -Fc |wc -c > > 4156548 > > Thanks. Those are definitely interesting improvements/optimizations! > > I suggest we track them as a separate patch series - please add them to > the CF app (I guess you'll have to add them to 2023-07 at this point, > but we can get them in, I think). Thanks for looking. I'm not sure if I'm the best person to write/submit the patch to implement that for LZ4. Georgios, would you want to take on this change ? I think that needs to be changed for v16, since 1) LZ4F works so much better like this, and 2) we can't change it later without breaking compatibility of the dumpfiles by changing the header with another name other than "lz4". Also, I imagine we'd want to continue supporting the ability to *restore* a dumpfile using the old(current) format, which would be untestable code unless we also preserved the ability to write it somehow (like -Z lz4-old). One issue is that LZ4F_createCompressionContext() and LZ4F_compressBegin() ought to be called in InitCompressorLZ4(). It seems like it might *need* to be called there to avoid exactly the kind of issue that I reported with empty LOs with gzip. But InitCompressorLZ4() isn't currently passed the ArchiveHandle, so can't write the header. And LZ4CompressorState has a simple char *buf, and not an more elaborate data structure like zlib. You could work around that by keeping storing the "len" of the existing buffer, and flushing it in EndCompressorLZ4(), but that adds needless complexity to the Write and End functions. Maybe the Init function should be passed the AH. -- Justin
On 3/9/23 17:15, Justin Pryzby wrote: > On Wed, Mar 01, 2023 at 05:39:54PM +0100, Tomas Vondra wrote: >> On 2/27/23 05:49, Justin Pryzby wrote: >>> On Sat, Feb 25, 2023 at 08:05:53AM -0600, Justin Pryzby wrote: >>>> On Fri, Feb 24, 2023 at 11:02:14PM -0600, Justin Pryzby wrote: >>>>> I have some fixes (attached) and questions while polishing the patch for >>>>> zstd compression. The fixes are small and could be integrated with the >>>>> patch for zstd, but could be applied independently. >>>> >>>> One more - WriteDataToArchiveGzip() says: >>> >>> One more again. >>> >>> The LZ4 path is using non-streaming mode, which compresses each block >>> without persistent state, giving poor compression for -Fc compared with >>> -Fp. If the data is highly compressible, the difference can be orders >>> of magnitude. >>> >>> $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -Z lz4 -Fp |wc -c >>> 12351763 >>> $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -Z lz4 -Fc |wc -c >>> 21890708 >>> >>> That's not true for gzip: >>> >>> $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Z gzip -Fc |wc -c >>> 2118869 >>> $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Z gzip -Fp |wc -c >>> 2115832 >>> >>> The function ought to at least use streaming mode, so each block/row >>> isn't compressioned in isolation. 003 is a simple patch to use >>> streaming mode, which improves the -Fc case: >>> >>> $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -Z lz4 -Fc |wc -c >>> 15178283 >>> >>> However, that still flushes the compression buffer, writing a block >>> header, for every row. With a single-column table, pg_dump -Fc -Z lz4 >>> still outputs ~10% *more* data than with no compression at all. And >>> that's for compressible data. >>> >>> $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Fc -Z lz4 |wc -c >>> 12890296 >>> $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Fc -Z none |wc -c >>> 11890296 >>> >>> I think this should use the LZ4F API with frames, which are buffered to >>> avoid outputting a header for every single row. The LZ4F format isn't >>> compatible with the LZ4 format, so (unlike changing to the streaming >>> API) that's not something we can change in a bugfix release. I consider >>> this an Opened Item. >>> >>> With the LZ4F API in 004, -Fp and -Fc are essentially the same size >>> (like gzip). (Oh, and the output is three times smaller, too.) >>> >>> $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Z lz4 -Fp |wc -c >>> 4155448 >>> $ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Z lz4 -Fc |wc -c >>> 4156548 >> >> Thanks. Those are definitely interesting improvements/optimizations! >> >> I suggest we track them as a separate patch series - please add them to >> the CF app (I guess you'll have to add them to 2023-07 at this point, >> but we can get them in, I think). > > Thanks for looking. I'm not sure if I'm the best person to write/submit > the patch to implement that for LZ4. Georgios, would you want to take > on this change ? > > I think that needs to be changed for v16, since 1) LZ4F works so much > better like this, and 2) we can't change it later without breaking > compatibility of the dumpfiles by changing the header with another name > other than "lz4". Also, I imagine we'd want to continue supporting the > ability to *restore* a dumpfile using the old(current) format, which > would be untestable code unless we also preserved the ability to write > it somehow (like -Z lz4-old). > I'm a bit confused about the lz4 vs. lz4f stuff, TBH. If we switch to lz4f, doesn't that mean it (e.g. restore) won't work on systems that only have older lz4 version? What would/should happen if we take backup compressed with lz4f, an then try restoring it on an older system where lz4 does not support lz4f? Maybe if lz4f format is incompatible with regular lz4, we should treat it as a separate compression method 'lz4f'? I'm mostly afk until the end of the week, but I tried searching for lz4f info - the results are not particularly enlightening, unfortunately. AFAICS this only applies to lz4f stuff. Or would the streaming mode be a breaking change too? > One issue is that LZ4F_createCompressionContext() and > LZ4F_compressBegin() ought to be called in InitCompressorLZ4(). It > seems like it might *need* to be called there to avoid exactly the kind > of issue that I reported with empty LOs with gzip. But > InitCompressorLZ4() isn't currently passed the ArchiveHandle, so can't > write the header. And LZ4CompressorState has a simple char *buf, and > not an more elaborate data structure like zlib. You could work around > that by keeping storing the "len" of the existing buffer, and flushing > it in EndCompressorLZ4(), but that adds needless complexity to the Write > and End functions. Maybe the Init function should be passed the AH. > Not sure, but looking at GzipCompressorState I see the only extra thing it has (compared to LZ4CompressorState) is "z_streamp". I can't experiment with this until the end of this week, so perhaps that's not workable, but wouldn't it be better to add a similar field into LZ4CompressorState? Passing AH to the init function seems like a violation of abstraction. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Mar 09, 2023 at 06:58:20PM +0100, Tomas Vondra wrote: > I'm a bit confused about the lz4 vs. lz4f stuff, TBH. If we switch to > lz4f, doesn't that mean it (e.g. restore) won't work on systems that > only have older lz4 version? What would/should happen if we take backup > compressed with lz4f, an then try restoring it on an older system where > lz4 does not support lz4f? You seem to be thinking about LZ4F as a weird, new innovation I'm experimenting with, but compress_lz4.c already uses LZ4F for its "file" API. LZ4F is also what's written by the lz4 CLI tool, and I found that LZ4F has been included in the library for ~8 years: https://github.com/lz4/lz4/releases?page=2 r126 Dec 24, 2014 New : lz4frame API is now integrated into liblz4 > Maybe if lz4f format is incompatible with regular lz4, we should treat > it as a separate compression method 'lz4f'? > > I'm mostly afk until the end of the week, but I tried searching for lz4f > info - the results are not particularly enlightening, unfortunately. > > AFAICS this only applies to lz4f stuff. Or would the streaming mode be a > breaking change too? Streaming mode outputs the same format as the existing code, but gives better compression. We could (theoretically) change it in a bugfix release, and old output would still be restorable (I think new output would even be restorable with the old versions of pg_restore). But that's not true for LZ4F. The benefit there is that it avoids outputing a separate block for each row. That's essential for narrow tables, for which the block header currently being written has an overhead several times larger than the data. -- Justin
On Fri, Mar 10, 2023 at 07:05:49AM -0600, Justin Pryzby wrote: > On Thu, Mar 09, 2023 at 06:58:20PM +0100, Tomas Vondra wrote: >> I'm a bit confused about the lz4 vs. lz4f stuff, TBH. If we switch to >> lz4f, doesn't that mean it (e.g. restore) won't work on systems that >> only have older lz4 version? What would/should happen if we take backup >> compressed with lz4f, an then try restoring it on an older system where >> lz4 does not support lz4f? > > You seem to be thinking about LZ4F as a weird, new innovation I'm > experimenting with, but compress_lz4.c already uses LZ4F for its "file" > API. Note: we already use lz4 frames in pg_receivewal (for WAL) and pg_basebackup (bbstreamer). -- Michael
Вложения
Hello, 23.02.2023 23:24, Tomas Vondra wrote: > On 2/23/23 16:26, Tomas Vondra wrote: >> Thanks for v30 with the updated commit messages. I've pushed 0001 after >> fixing a comment typo and removing (I think) an unnecessary change in an >> error message. >> >> I'll give the buildfarm a bit of time before pushing 0002 and 0003. >> > I've now pushed 0002 and 0003, after minor tweaks (a couple typos etc.), > and marked the CF entry as committed. Thanks for the patch! > > I wonder how difficult would it be to add the zstd compression, so that > we don't have the annoying "unsupported" cases. With the patch 0003 committed, a single warning -Wtype-limits appeared in the master branch: $ CPPFLAGS="-Og -Wtype-limits" ./configure --with-lz4 -q && make -s -j8 compress_lz4.c: In function ‘LZ4File_gets’: compress_lz4.c:492:19: warning: comparison of unsigned expression in ‘< 0’ is always false [-Wtype-limits] 492 | if (dsize < 0) | (I wonder, is it accidental that there no other places that triggers the warning, or some buildfarm animals had this check enabled before?) It is not a false positive as can be proved by the 002_pg_dump.pl modified as follows: - program => $ENV{'LZ4'}, + program => 'mv', args => [ - '-z', '-f', '--rm', "$tempdir/compression_lz4_dir/blobs.toc", "$tempdir/compression_lz4_dir/blobs.toc.lz4", ], }, A diagnostic logging added shows: LZ4File_gets() after LZ4File_read_internal; dsize: 18446744073709551615 and pg_restore fails with: error: invalid line in large object TOC file ".../src/bin/pg_dump/tmp_check/tmp_test_22ri/compression_lz4_dir/blobs.toc": "????" Best regards, Alexander
------- Original Message ------- On Saturday, March 11th, 2023 at 7:00 AM, Alexander Lakhin <exclusion@gmail.com> wrote: > Hello, > 23.02.2023 23:24, Tomas Vondra wrote: > > > On 2/23/23 16:26, Tomas Vondra wrote: > > > > > Thanks for v30 with the updated commit messages. I've pushed 0001 after > > > fixing a comment typo and removing (I think) an unnecessary change in an > > > error message. > > > > > > I'll give the buildfarm a bit of time before pushing 0002 and 0003. > > > > I've now pushed 0002 and 0003, after minor tweaks (a couple typos etc.), > > and marked the CF entry as committed. Thanks for the patch! > > > > I wonder how difficult would it be to add the zstd compression, so that > > we don't have the annoying "unsupported" cases. > > > With the patch 0003 committed, a single warning -Wtype-limits appeared in the > master branch: > $ CPPFLAGS="-Og -Wtype-limits" ./configure --with-lz4 -q && make -s -j8 > compress_lz4.c: In function ‘LZ4File_gets’: > compress_lz4.c:492:19: warning: comparison of unsigned expression in ‘< 0’ is > always false [-Wtype-limits] > 492 | if (dsize < 0) > | Thank you Alexander. Please find attached an attempt to address it. > (I wonder, is it accidental that there no other places that triggers > the warning, or some buildfarm animals had this check enabled before?) I can not answer about the buildfarms. Do you think that adding an explicit check for this warning in meson would help? I am a bit uncertain as I think that type-limits are included in extra. @@ -1748,6 +1748,7 @@ common_warning_flags = [ '-Wshadow=compatible-local', # This was included in -Wall/-Wformat in older GCC versions '-Wformat-security', + '-Wtype-limits', ] > > It is not a false positive as can be proved by the 002_pg_dump.pl modified as > follows: > - program => $ENV{'LZ4'}, > > + program => 'mv', > > args => [ > > - '-z', '-f', '--rm', > "$tempdir/compression_lz4_dir/blobs.toc", > "$tempdir/compression_lz4_dir/blobs.toc.lz4", > ], > }, Correct, it is not a false positive. The existing testing framework provides limited support for exercising error branches. Especially so when those are dependent on generated output. > A diagnostic logging added shows: > LZ4File_gets() after LZ4File_read_internal; dsize: 18446744073709551615 > > and pg_restore fails with: > error: invalid line in large object TOC file > ".../src/bin/pg_dump/tmp_check/tmp_test_22ri/compression_lz4_dir/blobs.toc": "????" It is a good thing that the restore fails with bad input. Yet it should have failed earlier. The attached makes certain it does fail earlier. Cheers, //Georgios > > Best regards, > Alexander
Вложения
Hi Georgios, 11.03.2023 13:50, gkokolatos@pm.me wrote: > I can not answer about the buildfarms. Do you think that adding an explicit > check for this warning in meson would help? I am a bit uncertain as I think > that type-limits are included in extra. > > @@ -1748,6 +1748,7 @@ common_warning_flags = [ > '-Wshadow=compatible-local', > # This was included in -Wall/-Wformat in older GCC versions > '-Wformat-security', > + '-Wtype-limits', > ] I'm not sure that I can promote additional checks (or determine where to put them), but if some patch introduces a warning of a type that wasn't present before, I think it's worth to eliminate the warning (if it is sensible) to keep the source code check baseline at the same level or even lift it up gradually. I've also found that the same commit introduced a single instance of the analyzer-possible-null-argument warning: CPPFLAGS="-Og -fanalyzer -Wno-analyzer-malloc-leak -Wno-analyzer-file-leak -Wno-analyzer-null-dereference -Wno-analyzer-shift-count-overflow -Wno-analyzer-free-of-non-heap -Wno-analyzer-null-argument -Wno-analyzer-double-free -Wanalyzer-possible-null-argument" ./configure --with-lz4 -q && make -s -j8 compress_io.c: In function ‘hasSuffix’: compress_io.c:158:47: warning: use of possibly-NULL ‘filename’ where non-null expected [CWE-690] [-Wanalyzer-possible-null-argument] 158 | int filenamelen = strlen(filename); | ^~~~~~~~~~~~~~~~ ‘InitDiscoverCompressFileHandle’: events 1-3 ... (I use gcc-11.3.) As I can see, many existing uses of strdup() are followed by a check for null result, so maybe it's a common practice and a similar check should be added in InitDiscoverCompressFileHandle(). (There also a couple of other warnings introduced with the lz4 compression patches, but those ones are not unique, so I maybe they aren't worth fixing.) >> It is a good thing that the restore fails with bad input. Yet it should >> have failed earlier. The attached makes certain it does fail earlier. >> Thanks! Your patch definitely fixes the issue. Best regards, Alexander
On 11.03.23 07:00, Alexander Lakhin wrote: > Hello, > 23.02.2023 23:24, Tomas Vondra wrote: >> On 2/23/23 16:26, Tomas Vondra wrote: >>> Thanks for v30 with the updated commit messages. I've pushed 0001 after >>> fixing a comment typo and removing (I think) an unnecessary change in an >>> error message. >>> >>> I'll give the buildfarm a bit of time before pushing 0002 and 0003. >>> >> I've now pushed 0002 and 0003, after minor tweaks (a couple typos etc.), >> and marked the CF entry as committed. Thanks for the patch! >> >> I wonder how difficult would it be to add the zstd compression, so that >> we don't have the annoying "unsupported" cases. > > With the patch 0003 committed, a single warning -Wtype-limits appeared > in the > master branch: > $ CPPFLAGS="-Og -Wtype-limits" ./configure --with-lz4 -q && make -s -j8 > compress_lz4.c: In function ‘LZ4File_gets’: > compress_lz4.c:492:19: warning: comparison of unsigned expression in ‘< > 0’ is always false [-Wtype-limits] > 492 | if (dsize < 0) > | > (I wonder, is it accidental that there no other places that triggers > the warning, or some buildfarm animals had this check enabled before?) I think there is an underlying problem in this code that it dances back and forth between size_t and int in an unprincipled way. In the code that triggers the warning, dsize is size_t. dsize is the return from LZ4File_read_internal(), which is declared to return int. The variable that LZ4File_read_internal() returns in the success case is size_t, but in case of an error it returns -1. (So the code that is warning is meaning to catch this error case, but it won't ever work.) Further below LZ4File_read_internal() calls LZ4File_read_overflow(), which is declared to return int, but in some cases it returns fs->overflowlen, which is size_t. This should be cleaned up. AFAICT, the upstream API in lz4.h uses int for size values, but lz4frame.h uses size_t, so I don't know what the correct approach is.
On 3/12/23 11:07, Peter Eisentraut wrote: > On 11.03.23 07:00, Alexander Lakhin wrote: >> Hello, >> 23.02.2023 23:24, Tomas Vondra wrote: >>> On 2/23/23 16:26, Tomas Vondra wrote: >>>> Thanks for v30 with the updated commit messages. I've pushed 0001 after >>>> fixing a comment typo and removing (I think) an unnecessary change >>>> in an >>>> error message. >>>> >>>> I'll give the buildfarm a bit of time before pushing 0002 and 0003. >>>> >>> I've now pushed 0002 and 0003, after minor tweaks (a couple typos etc.), >>> and marked the CF entry as committed. Thanks for the patch! >>> >>> I wonder how difficult would it be to add the zstd compression, so that >>> we don't have the annoying "unsupported" cases. >> >> With the patch 0003 committed, a single warning -Wtype-limits appeared >> in the >> master branch: >> $ CPPFLAGS="-Og -Wtype-limits" ./configure --with-lz4 -q && make -s -j8 >> compress_lz4.c: In function ‘LZ4File_gets’: >> compress_lz4.c:492:19: warning: comparison of unsigned expression in >> ‘< 0’ is always false [-Wtype-limits] >> 492 | if (dsize < 0) >> | >> (I wonder, is it accidental that there no other places that triggers >> the warning, or some buildfarm animals had this check enabled before?) > > I think there is an underlying problem in this code that it dances back > and forth between size_t and int in an unprincipled way. > > In the code that triggers the warning, dsize is size_t. dsize is the > return from LZ4File_read_internal(), which is declared to return int. > The variable that LZ4File_read_internal() returns in the success case is > size_t, but in case of an error it returns -1. (So the code that is > warning is meaning to catch this error case, but it won't ever work.) > Further below LZ4File_read_internal() calls LZ4File_read_overflow(), > which is declared to return int, but in some cases it returns > fs->overflowlen, which is size_t. > I agree. I just got home so I looked at this only very briefly, but I think it's clearly wrong to assign the LZ4File_read_internal() result to a size_t variable (and it seems to me LZ4File_gets does the same mistake with LZ4File_read_internal() result). I'll get this fixed early next week, I'm too tired to do that now without likely causing further issues. > This should be cleaned up. > > AFAICT, the upstream API in lz4.h uses int for size values, but > lz4frame.h uses size_t, so I don't know what the correct approach is. Yeah, that's a good point. I think Justin is right we should be using the LZ4F stuff, so ultimately we'll probably switch to size_t. But IMO it's definitely better to correct the current code first, and only then switch to LZ4F (from one correct state to another). regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 3/11/23 11:50, gkokolatos@pm.me wrote: > ------- Original Message ------- > On Saturday, March 11th, 2023 at 7:00 AM, Alexander Lakhin <exclusion@gmail.com> wrote: > >> Hello, >> 23.02.2023 23:24, Tomas Vondra wrote: >> >>> On 2/23/23 16:26, Tomas Vondra wrote: >>> >>>> Thanks for v30 with the updated commit messages. I've pushed 0001 after >>>> fixing a comment typo and removing (I think) an unnecessary change in an >>>> error message. >>>> >>>> I'll give the buildfarm a bit of time before pushing 0002 and 0003. >>> >>> I've now pushed 0002 and 0003, after minor tweaks (a couple typos etc.), >>> and marked the CF entry as committed. Thanks for the patch! >>> >>> I wonder how difficult would it be to add the zstd compression, so that >>> we don't have the annoying "unsupported" cases. >> >> >> With the patch 0003 committed, a single warning -Wtype-limits appeared in the >> master branch: >> $ CPPFLAGS="-Og -Wtype-limits" ./configure --with-lz4 -q && make -s -j8 >> compress_lz4.c: In function ‘LZ4File_gets’: >> compress_lz4.c:492:19: warning: comparison of unsigned expression in ‘< 0’ is >> always false [-Wtype-limits] >> 492 | if (dsize < 0) >> | > > Thank you Alexander. Please find attached an attempt to address it. > >> (I wonder, is it accidental that there no other places that triggers >> the warning, or some buildfarm animals had this check enabled before?) > > I can not answer about the buildfarms. Do you think that adding an explicit > check for this warning in meson would help? I am a bit uncertain as I think > that type-limits are included in extra. > > @@ -1748,6 +1748,7 @@ common_warning_flags = [ > '-Wshadow=compatible-local', > # This was included in -Wall/-Wformat in older GCC versions > '-Wformat-security', > + '-Wtype-limits', > ] > >> >> It is not a false positive as can be proved by the 002_pg_dump.pl modified as >> follows: >> - program => $ENV{'LZ4'}, >> >> + program => 'mv', >> >> args => [ >> >> - '-z', '-f', '--rm', >> "$tempdir/compression_lz4_dir/blobs.toc", >> "$tempdir/compression_lz4_dir/blobs.toc.lz4", >> ], >> }, > > Correct, it is not a false positive. The existing testing framework provides > limited support for exercising error branches. Especially so when those are > dependent on generated output. > >> A diagnostic logging added shows: >> LZ4File_gets() after LZ4File_read_internal; dsize: 18446744073709551615 >> >> and pg_restore fails with: >> error: invalid line in large object TOC file >> ".../src/bin/pg_dump/tmp_check/tmp_test_22ri/compression_lz4_dir/blobs.toc": "????" > > It is a good thing that the restore fails with bad input. Yet it should > have failed earlier. The attached makes certain it does fail earlier. > Thanks for the patch. I did look if there are other places that might have the same issue, and I think there are - see attached 0002. For example LZ4File_write is declared to return size_t, but then it also does if (LZ4F_isError(status)) { fs->errcode = status; return -1; } That won't work :-( And these issues may not be restricted to lz4 code - Gzip_write is declared to return size_t, but it does return gzwrite(gzfp, ptr, size); and gzwrite returns int. Although, maybe that's correct, because gzwrite() is "0 on error" so maybe this is fine ... However, Gzip_read assigns gzread() to size_t, and that does not seem great. It probably will still trigger the following pg_fatal() because it'd be very lucky to match the expected 'size', but it's confusing. I wonder whether CompressorState should use int or size_t for the read_func/write_func callbacks. I guess no option is perfect, i.e. no data type will work for all compression libraries we might use (lz4 uses int while lz4f uses size_t, to there's that). It's a bit weird the "open" functions return int and the read/write size_t. Maybe we should stick to int, which is what the old functions (cfwrite etc.) did. But I think the actual problem here is that the API does not clearly define how errors are communicated. I mean, it's nice to return the value returned by the library function without "mangling" it by conversion to size_t, but what if the libraries communicate errors in different way? Some may return "0" while others may return "-1". I think the right approach is to handle all library errors and not just let them through. So Gzip_write() needs to check the return value, and either call pg_fatal() or translate it to an error defined by the API. For example we might say "returns 0 on error" and then translate all library-specific errors to that. While looking at the code I realized a couple function comments don't say what's returned in case of error, etc. So 0004 adds those. 0003 is a couple minor assorted comments/questions: - Should we move ZLIB_OUT_SIZE/ZLIB_IN_SIZE to compress_gzip.c? - Why are LZ4 buffer sizes different (ZLIB has both 4kB)? - I wonder if we actually need LZ4F_HEADER_SIZE_MAX? Is it even possible for LZ4F_compressBound to return value this small (especially for 16kB input buffer)? regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
Hi Justin, Thanks for the patch. On 3/8/23 02:45, Justin Pryzby wrote: > On Wed, Mar 01, 2023 at 04:52:49PM +0100, Tomas Vondra wrote: >> Thanks. That seems correct to me, but I find it somewhat confusing, >> because we now have >> >> DeflateCompressorInit vs. InitCompressorGzip >> >> DeflateCompressorEnd vs. EndCompressorGzip >> >> DeflateCompressorData - The name doesn't really say what it does (would >> be better to have a verb in there, I think). >> >> I wonder if we can make this somehow clearer? > > To move things along, I updated Georgios' patch: > > Rename DeflateCompressorData() to DeflateCompressorCommon(); Hmmm, I don't find "common" any clearer than "data" :-( There needs to at least be a comment explaining what "common" does. > Rearrange functions to their original order allowing a cleaner diff to the prior code; OK. I wasn't very enthusiastic about this initially, but after thinking about it a bit I think it's meaningful to make diffs clearer. But I don't see much difference with/without the patch. The git diff --diff-algorithm=minimal -w e9960732a~:src/bin/pg_dump/compress_io.c src/bin/pg_dump/compress_gzip.c Produces ~25k diff with/without the patch. What am I doing wrong? > Change pg_fatal() to an assertion+comment; Yeah, that's reasonable. I'd even ditch the assert/comment, TBH. We could add such protections against "impossible" stuff to a zillion other places and the confusion likely outweighs the benefits. > Update the commit message and fix a few typos; > Thanks. I don't want to annoy you too much, but could you split the patch into the "empty-data" fix and all the other changes (rearranging functions etc.)? I'd rather not mix those in the same commit. >> Also, InitCompressorGzip says this: >> >> /* >> * If the caller has defined a write function, prepare the necessary >> * state. Avoid initializing during the first write call, because End >> * may be called without ever writing any data. >> */ >> if (cs->writeF) >> DeflateCompressorInit(cs); >> >> Does it actually make sense to not have writeF defined in some cases? > > InitCompressor is being called for either reading or writing, either of > which could be null: > > src/bin/pg_dump/pg_backup_custom.c: ctx->cs = AllocateCompressor(AH->compression_spec, > src/bin/pg_dump/pg_backup_custom.c- NULL, > src/bin/pg_dump/pg_backup_custom.c- _CustomWriteFunc); > -- > src/bin/pg_dump/pg_backup_custom.c: cs = AllocateCompressor(AH->compression_spec, > src/bin/pg_dump/pg_backup_custom.c- _CustomReadFunc, NULL); > > It's confusing that the comment says "Avoid initializing...". What it > really means is "Initialize eagerly...". But that makes more sense in > the context of the commit message for this bugfix than in a comment. So > I changed that too. > > + /* If deflation was initialized, finalize it */ > + if (cs->private_data) > + DeflateCompressorEnd(AH, cs); > > Maybe it'd be more clear if this used "if (cs->writeF)", like in the > init function ? > Yeah, if the two checks are equivalent, it'd be better to stick to the same check everywhere. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
------- Original Message ------- On Monday, March 13th, 2023 at 10:47 PM, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > > Change pg_fatal() to an assertion+comment; > > > Yeah, that's reasonable. I'd even ditch the assert/comment, TBH. We > could add such protections against "impossible" stuff to a zillion other > places and the confusion likely outweighs the benefits. > A minor note to add is to not ignore the lessons learned from a7885c9bb. For example, as the testing framework stands, one can not test that the contents of the custom format are indeed compressed. One can infer it by examining the header of the produced dump and searching for the compression flag. The code responsible for writing the header and the code responsible for actually dealing with data, is not the same. Also, the compression library itself will happily read and write uncompressed data. A pg_fatal, assertion, or similar, is the only guard rail against this kind of error. Without it, the tests will continue passing even after e.g. a wrong initialization of the API. It was such a case that lead to a7885c9bb in the first place. I do think that we wish it to be an "impossible" case. Also it will be an untested case with some history without such a guard rail. Of course I will not object to removing it, if you think that is more confusing than useful. Cheers, //Georgios > > regards > > -- > Tomas Vondra > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company
------- Original Message ------- On Monday, March 13th, 2023 at 9:21 PM, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > > > > On 3/11/23 11:50, gkokolatos@pm.me wrote: > > > ------- Original Message ------- > > On Saturday, March 11th, 2023 at 7:00 AM, Alexander Lakhin exclusion@gmail.com wrote: > > > > > Hello, > > > 23.02.2023 23:24, Tomas Vondra wrote: > > > Thanks for the patch. > > I did look if there are other places that might have the same issue, and > I think there are - see attached 0002. For example LZ4File_write is > declared to return size_t, but then it also does > > if (LZ4F_isError(status)) > { > fs->errcode = status; > > return -1; > } > > That won't work :-( You are right. It is confusing. > > And these issues may not be restricted to lz4 code - Gzip_write is > declared to return size_t, but it does > > return gzwrite(gzfp, ptr, size); > > and gzwrite returns int. Although, maybe that's correct, because > gzwrite() is "0 on error" so maybe this is fine ... > > However, Gzip_read assigns gzread() to size_t, and that does not seem > great. It probably will still trigger the following pg_fatal() because > it'd be very lucky to match the expected 'size', but it's confusing. Agreed. > > > I wonder whether CompressorState should use int or size_t for the > read_func/write_func callbacks. I guess no option is perfect, i.e. no > data type will work for all compression libraries we might use (lz4 uses > int while lz4f uses size_t, to there's that). > > It's a bit weird the "open" functions return int and the read/write > size_t. Maybe we should stick to int, which is what the old functions > (cfwrite etc.) did. > You are right. These functions are modeled by the open/fread/ fwrite etc, and they have kept the return types of these ones. Their callers do check the return value of read_func and write_func against the requested size of bytes to be transferred. > > But I think the actual problem here is that the API does not clearly > define how errors are communicated. I mean, it's nice to return the > value returned by the library function without "mangling" it by > conversion to size_t, but what if the libraries communicate errors in > different way? Some may return "0" while others may return "-1". Agreed. > > I think the right approach is to handle all library errors and not just > let them through. So Gzip_write() needs to check the return value, and > either call pg_fatal() or translate it to an error defined by the API. It makes sense. It will change some of the behaviour of the callers, mostly on what constitutes an error, and what error message is emitted. This is a reasonable change though. > > For example we might say "returns 0 on error" and then translate all > library-specific errors to that. Ok. > While looking at the code I realized a couple function comments don't > say what's returned in case of error, etc. So 0004 adds those. > > 0003 is a couple minor assorted comments/questions: > > - Should we move ZLIB_OUT_SIZE/ZLIB_IN_SIZE to compress_gzip.c? It would make things clearer. > - Why are LZ4 buffer sizes different (ZLIB has both 4kB)? Clearly some comments are needed, if the difference makes sense. > - I wonder if we actually need LZ4F_HEADER_SIZE_MAX? Is it even possible > for LZ4F_compressBound to return value this small (especially for 16kB > input buffer)? > I would recommend to keep it. Earlier versions of LZ4F_HEADER_SIZE_MAX do not have it. Later versions do advise to use it. Would you mind me trying to come with a patch to address your points? Cheers, //Georgios > > > regards > > > -- > Tomas Vondra > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company
On 3/14/23 16:18, gkokolatos@pm.me wrote: > ...> Would you mind me trying to come with a patch to address your points? > That'd be great, thanks. Please keep it split into smaller patches - two might work, with one patch for "cosmetic" changes and the other tweaking the API error-handling stuff. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 3/14/23 12:07, gkokolatos@pm.me wrote: > > > ------- Original Message ------- > On Monday, March 13th, 2023 at 10:47 PM, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > > >> >>> Change pg_fatal() to an assertion+comment; >> >> >> Yeah, that's reasonable. I'd even ditch the assert/comment, TBH. We >> could add such protections against "impossible" stuff to a zillion other >> places and the confusion likely outweighs the benefits. >> > > A minor note to add is to not ignore the lessons learned from a7885c9bb. > > For example, as the testing framework stands, one can not test that the > contents of the custom format are indeed compressed. One can infer it by > examining the header of the produced dump and searching for the > compression flag. The code responsible for writing the header and the > code responsible for actually dealing with data, is not the same. Also, > the compression library itself will happily read and write uncompressed > data. > > A pg_fatal, assertion, or similar, is the only guard rail against this > kind of error. Without it, the tests will continue passing even after > e.g. a wrong initialization of the API. It was such a case that lead to > a7885c9bb in the first place. I do think that we wish it to be an > "impossible" case. Also it will be an untested case with some history > without such a guard rail. > So is the pg_fatal() a dead code or not? My understanding was it's not really reachable, and the main purpose is to remind people this is not possible. Or am I mistaken/confused? If it's reachable, can we test it? AFAICS we don't, per the coverage reports. If it's just a protection against incorrect API initialization, then an assert is the right solution, I think. With proper comment. But can't we actually verify that *during* the initialization? Also, how come WriteDataToArchiveLZ4() doesn't need this protection too? Or is that due to gzip being the default compression method? > Of course I will not object to removing it, if you think that is more > confusing than useful. > Not sure, I have a feeling I don't quite understand in what situation this actually helps. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Mar 13, 2023 at 10:47:12PM +0100, Tomas Vondra wrote: > > Rearrange functions to their original order allowing a cleaner diff to the prior code; > > OK. I wasn't very enthusiastic about this initially, but after thinking > about it a bit I think it's meaningful to make diffs clearer. But I > don't see much difference with/without the patch. The > > git diff --diff-algorithm=minimal -w e9960732a~:src/bin/pg_dump/compress_io.c src/bin/pg_dump/compress_gzip.c > > Produces ~25k diff with/without the patch. What am I doing wrong? Do you mean 25 kB of diff ? I agree that the statistics of the diff output don't change a lot: 1 file changed, 201 insertions(+), 570 deletions(-) 1 file changed, 198 insertions(+), 548 deletions(-) But try reading the diff while looking for the cause of a bug. It's the difference between reading 50, two-line changes, and reading a hunk that replaces 100 lines with a different 100 lines, with empty/unrelated lines randomly thrown in as context. When the diff is readable, the pg_fatal() also stands out. > > Change pg_fatal() to an assertion+comment; > > Yeah, that's reasonable. I'd even ditch the assert/comment, TBH. We > could add such protections against "impossible" stuff to a zillion other > places and the confusion likely outweighs the benefits. > > > Update the commit message and fix a few typos; > > Thanks. I don't want to annoy you too much, but could you split the > patch into the "empty-data" fix and all the other changes (rearranging > functions etc.)? I'd rather not mix those in the same commit. I don't know if that makes sense? The "empty-data" fix creates a new function called DeflateCompressorInit(). My proposal was to add the new function in the same place in the file as it used to be. The patch also moves the pg_fatal() that's being removed. I don't think it's going to look any cleaner to read a history involving the pg_fatal() first being added, then moved, then removed. Anyway, I'll wait while the community continues discussion about the pg_fatal(). -- Justin
------- Original Message ------- On Tuesday, March 14th, 2023 at 4:32 PM, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > > > > On 3/14/23 16:18, gkokolatos@pm.me wrote: > > > ...> Would you mind me trying to come with a patch to address your points? > > > That'd be great, thanks. Please keep it split into smaller patches - two > might work, with one patch for "cosmetic" changes and the other tweaking > the API error-handling stuff. Please find attached a set for it. I will admit that the splitting in the series might not be ideal and what you requested. It is split on what seemed as a logical units. Please advice how a better split can look like. 0001 is unifying types and return values on the API 0002 is addressing the constant definitions 0003 is your previous 0004 adding comments As far as the error handling is concerned, you had said upthread: > I think the right approach is to handle all library errors and not just > let them through. So Gzip_write() needs to check the return value, and > either call pg_fatal() or translate it to an error defined by the API. While working on it, I thought it would be clearer and more consistent for the pg_fatal() to be called by the caller of the individual functions. Each individual function can keep track of the specifics of the error internally. Then the caller upon detecting that there was an error by checking the return value, can call pg_fatal() with a uniform error message and then add the specifics by calling the get_error_func(). Thoughts? Cheers, //Georgios > > regards > > -- > Tomas Vondra > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company
Вложения
On 3/16/23 18:04, gkokolatos@pm.me wrote: > > ------- Original Message ------- > On Tuesday, March 14th, 2023 at 4:32 PM, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: >> >> On 3/14/23 16:18, gkokolatos@pm.me wrote: >> >>> ...> Would you mind me trying to come with a patch to address your points? >> >> >> That'd be great, thanks. Please keep it split into smaller patches - two >> might work, with one patch for "cosmetic" changes and the other tweaking >> the API error-handling stuff. > > Please find attached a set for it. I will admit that the splitting in the > series might not be ideal and what you requested. It is split on what > seemed as a logical units. Please advice how a better split can look like. > > 0001 is unifying types and return values on the API > 0002 is addressing the constant definitions > 0003 is your previous 0004 adding comments > Thanks. I think the split seems reasonable - the goal was to not mix different changes, and from that POV it works. I'm not sure I understand the Gzip_read/Gzip_write changes in 0001. I mean, gzread/gzwrite returns int, so how does renaming the size_t variable solve the issue of negative values for errors? I mean, this - size_t ret; + size_t gzret; - ret = gzread(gzfp, ptr, size); + gzret = gzread(gzfp, ptr, size); means we still lost the information gzread() returned a negative value, no? We'll still probably trigger an error, but it's a bit weird. ISTM all this kinda assumes we're processing chunks of memory small enough that we'll never actually overflow int - I did check what the code in 15 does, and it seems use int and size_t quite arbitrarily. For example cfread() seems quite sane: int cfread(void *ptr, int size, cfp *fp) { int ret; ... ret = gzread(fp->compressedfp, ptr, size); ... return ret; } but then _PrintFileData() happily stashes it into a size_t, ignoring the signedness. Surely, if static void _PrintFileData(ArchiveHandle *AH, char *filename) { size_t cnt; ... while ((cnt = cfread(buf, buflen, cfp))) { ahwrite(buf, 1, cnt, AH); } ... } Unless I'm missing something, if gzread() ever returns -1 or some other negative error value, we'll cast it to size_t, while condition will evaluate to "true" and we'll happily chew on some random chunk of data. So the confusion is (at least partially) a preexisting issue ... For gzwrite() it seems to be fine, because that only returns 0 on error. OTOH it's defined to take 'int size' but then we happily pass size_t values to it. As I wrote earlier, this apparently assumes we never need to deal with buffers larger than int, and I don't think we have the ambition to relax that (I'm not sure it's even needed / possible). I see the read/write functions are now defined as int, but we only ever return 0/1 from them, and then interpret that as bool. Why not to define it like that? I don't think we need to adhere to the custom that everything returns "int". This is an internal API. Or if we want to stick to int, I'd define meaningful "nice" constants for 0/1. 0002 seems fine to me. I see you've ditched the idea of having two separate buffers, and replaced them with DEFAULT_IO_BUFFER_SIZE. Fine with me, although I wonder if this might have negative impact on performance or something (but I doubt that). 0003 seems fine too. > As far as the error handling is concerned, you had said upthread: > >> I think the right approach is to handle all library errors and not just >> let them through. So Gzip_write() needs to check the return value, and >> either call pg_fatal() or translate it to an error defined by the API. > > While working on it, I thought it would be clearer and more consistent > for the pg_fatal() to be called by the caller of the individual functions. > Each individual function can keep track of the specifics of the error > internally. Then the caller upon detecting that there was an error by > checking the return value, can call pg_fatal() with a uniform error > message and then add the specifics by calling the get_error_func(). > I agree it's cleaner the way you did it. I was thinking that with each compression function handling error internally, the callers would not need to do that. But I haven't realized there's logic to detect ENOSPC and so on, and we'd need to duplicate that in every compression func. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 3/16/23 01:20, Justin Pryzby wrote: > On Mon, Mar 13, 2023 at 10:47:12PM +0100, Tomas Vondra wrote: >>> Rearrange functions to their original order allowing a cleaner diff to the prior code; >> >> OK. I wasn't very enthusiastic about this initially, but after thinking >> about it a bit I think it's meaningful to make diffs clearer. But I >> don't see much difference with/without the patch. The >> >> git diff --diff-algorithm=minimal -w e9960732a~:src/bin/pg_dump/compress_io.c src/bin/pg_dump/compress_gzip.c >> >> Produces ~25k diff with/without the patch. What am I doing wrong? > > Do you mean 25 kB of diff ? Yes, if you redirect the git-diff to a file, it's a 25kB file. > I agree that the statistics of the diff output don't change a lot: > > 1 file changed, 201 insertions(+), 570 deletions(-) > 1 file changed, 198 insertions(+), 548 deletions(-) > > But try reading the diff while looking for the cause of a bug. It's the > difference between reading 50, two-line changes, and reading a hunk that > replaces 100 lines with a different 100 lines, with empty/unrelated > lines randomly thrown in as context. > > When the diff is readable, the pg_fatal() also stands out. > I don't know, maybe I'm doing something wrong or maybe I just am bad at looking at diffs, but if I apply the patch you submitted on 8/3 and do the git-diff above (output attached), it seems pretty incomprehensible to me :-( I don't see 50 two-line changes (I certainly wouldn't be able to identify the root cause of the bug based on that). >>> Change pg_fatal() to an assertion+comment; >> >> Yeah, that's reasonable. I'd even ditch the assert/comment, TBH. We >> could add such protections against "impossible" stuff to a zillion other >> places and the confusion likely outweighs the benefits. >> >>> Update the commit message and fix a few typos; >> >> Thanks. I don't want to annoy you too much, but could you split the >> patch into the "empty-data" fix and all the other changes (rearranging >> functions etc.)? I'd rather not mix those in the same commit. > > I don't know if that makes sense? The "empty-data" fix creates a new > function called DeflateCompressorInit(). My proposal was to add the new > function in the same place in the file as it used to be. > Got it. In that case I agree it's fine to do that in a single commit. > The patch also moves the pg_fatal() that's being removed. I don't think > it's going to look any cleaner to read a history involving the > pg_fatal() first being added, then moved, then removed. Anyway, I'll > wait while the community continues discussion about the pg_fatal(). > I think the agreement was to replace the pg_fatal with and assert, and I see your patch already does that. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
On Thu, Mar 16, 2023 at 11:30:50PM +0100, Tomas Vondra wrote: > On 3/16/23 01:20, Justin Pryzby wrote: > > But try reading the diff while looking for the cause of a bug. It's the > > difference between reading 50, two-line changes, and reading a hunk that > > replaces 100 lines with a different 100 lines, with empty/unrelated > > lines randomly thrown in as context. > > I don't know, maybe I'm doing something wrong or maybe I just am bad at > looking at diffs, but if I apply the patch you submitted on 8/3 and do > the git-diff above (output attached), it seems pretty incomprehensible > to me :-( I don't see 50 two-line changes (I certainly wouldn't be able > to identify the root cause of the bug based on that). It's true that most of the diff is still incomprehensible... But look at the part relevant to the "empty-data" bug: [... incomprehensible changes elided ...] > static void > -InitCompressorZlib(CompressorState *cs, int level) > +DeflateCompressorInit(CompressorState *cs) > { > + GzipCompressorState *gzipcs; > z_streamp zp; > > - zp = cs->zp = (z_streamp) pg_malloc(sizeof(z_stream)); > + gzipcs = (GzipCompressorState *) pg_malloc0(sizeof(GzipCompressorState)); > + zp = gzipcs->zp = (z_streamp) pg_malloc(sizeof(z_stream)); > zp->zalloc = Z_NULL; > zp->zfree = Z_NULL; > zp->opaque = Z_NULL; > > /* > - * zlibOutSize is the buffer size we tell zlib it can output to. We > - * actually allocate one extra byte because some routines want to append a > - * trailing zero byte to the zlib output. > + * outsize is the buffer size we tell zlib it can output to. We actually > + * allocate one extra byte because some routines want to append a trailing > + * zero byte to the zlib output. > */ > - cs->zlibOut = (char *) pg_malloc(ZLIB_OUT_SIZE + 1); > - cs->zlibOutSize = ZLIB_OUT_SIZE; > + gzipcs->outbuf = pg_malloc(ZLIB_OUT_SIZE + 1); > + gzipcs->outsize = ZLIB_OUT_SIZE; > > - if (deflateInit(zp, level) != Z_OK) > - pg_fatal("could not initialize compression library: %s", > - zp->msg); > + /* -Z 0 uses the "None" compressor -- not zlib with no compression */ > + Assert(cs->compression_spec.level != 0); > + > + if (deflateInit(zp, cs->compression_spec.level) != Z_OK) > + pg_fatal("could not initialize compression library: %s", zp->msg); > > /* Just be paranoid - maybe End is called after Start, with no Write */ > - zp->next_out = (void *) cs->zlibOut; > - zp->avail_out = cs->zlibOutSize; > + zp->next_out = gzipcs->outbuf; > + zp->avail_out = gzipcs->outsize; > + > + /* Keep track of gzipcs */ > + cs->private_data = gzipcs; > } > > static void > -EndCompressorZlib(ArchiveHandle *AH, CompressorState *cs) > +DeflateCompressorEnd(ArchiveHandle *AH, CompressorState *cs) > { > - z_streamp zp = cs->zp; > + GzipCompressorState *gzipcs = (GzipCompressorState *) cs->private_data; > + z_streamp zp; > > + zp = gzipcs->zp; > zp->next_in = NULL; > zp->avail_in = 0; > > /* Flush any remaining data from zlib buffer */ > - DeflateCompressorZlib(AH, cs, true); > + DeflateCompressorCommon(AH, cs, true); > > if (deflateEnd(zp) != Z_OK) > pg_fatal("could not close compression stream: %s", zp->msg); > > - free(cs->zlibOut); > - free(cs->zp); > + pg_free(gzipcs->outbuf); > + pg_free(gzipcs->zp); > + pg_free(gzipcs); > + cs->private_data = NULL; > } > > static void > -DeflateCompressorZlib(ArchiveHandle *AH, CompressorState *cs, bool flush) > +DeflateCompressorCommon(ArchiveHandle *AH, CompressorState *cs, bool flush) > { > - z_streamp zp = cs->zp; > - char *out = cs->zlibOut; > + GzipCompressorState *gzipcs = (GzipCompressorState *) cs->private_data; > + z_streamp zp = gzipcs->zp; > + void *out = gzipcs->outbuf; > int res = Z_OK; > > - while (cs->zp->avail_in != 0 || flush) > + while (gzipcs->zp->avail_in != 0 || flush) > { > res = deflate(zp, flush ? Z_FINISH : Z_NO_FLUSH); > if (res == Z_STREAM_ERROR) > pg_fatal("could not compress data: %s", zp->msg); > - if ((flush && (zp->avail_out < cs->zlibOutSize)) > + if ((flush && (zp->avail_out < gzipcs->outsize)) > || (zp->avail_out == 0) > || (zp->avail_in != 0) > ) > @@ -289,18 +122,18 @@ DeflateCompressorZlib(ArchiveHandle *AH, CompressorState *cs, bool flush) > * chunk is the EOF marker in the custom format. This should never > * happen but ... > */ > - if (zp->avail_out < cs->zlibOutSize) > + if (zp->avail_out < gzipcs->outsize) > { > /* > * Any write function should do its own error checking but to > * make sure we do a check here as well ... > */ > - size_t len = cs->zlibOutSize - zp->avail_out; > + size_t len = gzipcs->outsize - zp->avail_out; > > - cs->writeF(AH, out, len); > + cs->writeF(AH, (char *) out, len); > } > - zp->next_out = (void *) out; > - zp->avail_out = cs->zlibOutSize; > + zp->next_out = out; > + zp->avail_out = gzipcs->outsize; > } > > if (res == Z_STREAM_END) > @@ -309,16 +142,26 @@ DeflateCompressorZlib(ArchiveHandle *AH, CompressorState *cs, bool flush) > } > > static void > -WriteDataToArchiveZlib(ArchiveHandle *AH, CompressorState *cs, > - const char *data, size_t dLen) > +EndCompressorGzip(ArchiveHandle *AH, CompressorState *cs) > { > - cs->zp->next_in = (void *) unconstify(char *, data); > - cs->zp->avail_in = dLen; > - DeflateCompressorZlib(AH, cs, false); > + /* If deflation was initialized, finalize it */ > + if (cs->private_data) > + DeflateCompressorEnd(AH, cs); > } > > static void > -ReadDataFromArchiveZlib(ArchiveHandle *AH, ReadFunc readF) > +WriteDataToArchiveGzip(ArchiveHandle *AH, CompressorState *cs, > + const void *data, size_t dLen) > +{ > + GzipCompressorState *gzipcs = (GzipCompressorState *) cs->private_data; > + > + gzipcs->zp->next_in = (void *) unconstify(void *, data); > + gzipcs->zp->avail_in = dLen; > + DeflateCompressorCommon(AH, cs, false); > +} > + > +static void > +ReadDataFromArchiveGzip(ArchiveHandle *AH, CompressorState *cs) > { > z_streamp zp; > char *out; > @@ -342,7 +185,7 @@ ReadDataFromArchiveZlib(ArchiveHandle *AH, ReadFunc readF) > zp->msg); > > /* no minimal chunk size for zlib */ > - while ((cnt = readF(AH, &buf, &buflen))) > + while ((cnt = cs->readF(AH, &buf, &buflen))) > { > zp->next_in = (void *) buf; > zp->avail_in = cnt; > @@ -382,389 +225,196 @@ ReadDataFromArchiveZlib(ArchiveHandle *AH, ReadFunc readF) > free(out); > free(zp); > } [... more incomprehensible changes elided ...]
On 3/16/23 23:58, Justin Pryzby wrote: > On Thu, Mar 16, 2023 at 11:30:50PM +0100, Tomas Vondra wrote: >> On 3/16/23 01:20, Justin Pryzby wrote: >>> But try reading the diff while looking for the cause of a bug. It's the >>> difference between reading 50, two-line changes, and reading a hunk that >>> replaces 100 lines with a different 100 lines, with empty/unrelated >>> lines randomly thrown in as context. >> >> I don't know, maybe I'm doing something wrong or maybe I just am bad at >> looking at diffs, but if I apply the patch you submitted on 8/3 and do >> the git-diff above (output attached), it seems pretty incomprehensible >> to me :-( I don't see 50 two-line changes (I certainly wouldn't be able >> to identify the root cause of the bug based on that). > > It's true that most of the diff is still incomprehensible... > > But look at the part relevant to the "empty-data" bug: > Well, yeah. If you know where to look, and if you squint just the right way, then you can see any bug. I don't think I'd be able to spot the bug in the diff unless I knew in advance what the bug is. That being said, I don't object to moving the function etc. Unless there are alternative ideas how to fix the empty-data issue, I'll get this committed after playing with it a bit more. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
------- Original Message ------- On Thursday, March 16th, 2023 at 10:20 PM, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > > > > On 3/16/23 18:04, gkokolatos@pm.me wrote: > > > ------- Original Message ------- > > On Tuesday, March 14th, 2023 at 4:32 PM, Tomas Vondra tomas.vondra@enterprisedb.com wrote: > > > > > On 3/14/23 16:18, gkokolatos@pm.me wrote: > > > > > > > ...> Would you mind me trying to come with a patch to address your points? > > > > > > That'd be great, thanks. Please keep it split into smaller patches - two > > > might work, with one patch for "cosmetic" changes and the other tweaking > > > the API error-handling stuff. > > > > Please find attached a set for it. I will admit that the splitting in the > > series might not be ideal and what you requested. It is split on what > > seemed as a logical units. Please advice how a better split can look like. > > > > 0001 is unifying types and return values on the API > > 0002 is addressing the constant definitions > > 0003 is your previous 0004 adding comments > > > Thanks. I think the split seems reasonable - the goal was to not mix > different changes, and from that POV it works. > > I'm not sure I understand the Gzip_read/Gzip_write changes in 0001. I > mean, gzread/gzwrite returns int, so how does renaming the size_t > variable solve the issue of negative values for errors? I mean, this > > - size_t ret; > + size_t gzret; > > - ret = gzread(gzfp, ptr, size); > + gzret = gzread(gzfp, ptr, size); > > means we still lost the information gzread() returned a negative value, > no? We'll still probably trigger an error, but it's a bit weird. You are obviously correct. My bad, I miss-read the return type of gzread(). Please find an amended version attached. > Unless I'm missing something, if gzread() ever returns -1 or some other > negative error value, we'll cast it to size_t, while condition will > evaluate to "true" and we'll happily chew on some random chunk of data. > > So the confusion is (at least partially) a preexisting issue ... > > For gzwrite() it seems to be fine, because that only returns 0 on error. > OTOH it's defined to take 'int size' but then we happily pass size_t > values to it. > > As I wrote earlier, this apparently assumes we never need to deal with > buffers larger than int, and I don't think we have the ambition to relax > that (I'm not sure it's even needed / possible). Agreed. > I see the read/write functions are now defined as int, but we only ever > return 0/1 from them, and then interpret that as bool. Why not to define > it like that? I don't think we need to adhere to the custom that > everything returns "int". This is an internal API. Or if we want to > stick to int, I'd define meaningful "nice" constants for 0/1. The return types are now booleans and the callers have been made aware. > 0002 seems fine to me. I see you've ditched the idea of having two > separate buffers, and replaced them with DEFAULT_IO_BUFFER_SIZE. Fine > with me, although I wonder if this might have negative impact on > performance or something (but I doubt that). > I doubt that too. Thank you. > 0003 seems fine too. Thank you. > > As far as the error handling is concerned, you had said upthread: > > > > > I think the right approach is to handle all library errors and not just > > > let them through. So Gzip_write() needs to check the return value, and > > > either call pg_fatal() or translate it to an error defined by the API. > > > > While working on it, I thought it would be clearer and more consistent > > for the pg_fatal() to be called by the caller of the individual functions. > > Each individual function can keep track of the specifics of the error > > internally. Then the caller upon detecting that there was an error by > > checking the return value, can call pg_fatal() with a uniform error > > message and then add the specifics by calling the get_error_func(). > > > I agree it's cleaner the way you did it. > > I was thinking that with each compression function handling error > internally, the callers would not need to do that. But I haven't > realized there's logic to detect ENOSPC and so on, and we'd need to > duplicate that in every compression func. > If you agree, I can prepare a patch to improve on the error handling aspect of the API as a separate thread, since here we are trying to focus on correctness. Cheers, //Georgios > > regards > > -- > Tomas Vondra > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company
Вложения
On 3/17/23 16:43, gkokolatos@pm.me wrote: >> >> ... >> >> I agree it's cleaner the way you did it. >> >> I was thinking that with each compression function handling error >> internally, the callers would not need to do that. But I haven't >> realized there's logic to detect ENOSPC and so on, and we'd need to >> duplicate that in every compression func. >> > > If you agree, I can prepare a patch to improve on the error handling > aspect of the API as a separate thread, since here we are trying to > focus on correctness. > Yes, that makes sense. There are far too many patches in this thread already ... regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi, I was preparing to get the 3 cleanup patches pushed, so I updated/reworded the commit messages a bit (attached, please check). But I noticed the commit message for 0001 said: In passing save the appropriate errno in LZ4File_open_write in case that the caller is not using the API's get_error_func. I think that's far too low-level for a commit message, it'd be much more appropriate for a comment at the function. However, do we even need this behavior? I was looking for code calling this function without using get_error_func(), but no luck. And if there is such caller, shouldn't we fix it to use get_error_func()? regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
On Fri, Mar 17, 2023 at 03:43:58PM +0000, gkokolatos@pm.me wrote: > From a174cdff4ec8aad59f5bcc7e8d52218a14fe56fc Mon Sep 17 00:00:00 2001 > From: Georgios Kokolatos <gkokolatos@pm.me> > Date: Fri, 17 Mar 2023 14:45:58 +0000 > Subject: [PATCH v3 1/3] Improve type handling in pg_dump's compress file API > -int > +bool > EndCompressFileHandle(CompressFileHandle *CFH) > { > - int ret = 0; > + bool ret = 0; Should say "= false" ? > /* > * Write 'size' bytes of data into the file from 'ptr'. > + * > + * Returns true on success and false on error. > + */ > + bool (*write_func) (const void *ptr, size_t size, > - * Get a pointer to a string that describes an error that occurred during a > - * compress file handle operation. > + * Get a pointer to a string that describes an error that occurred during > + * a compress file handle operation. > */ > const char *(*get_error_func) (CompressFileHandle *CFH); This should mention that the error accessible in error_func() applies (only) to write_func() ? As long as this touches pg_backup_directory.c you could update the header comment to refer to "compressed extensions", not just .gz. I noticed that EndCompressorLZ4() tests "if (LZ4cs)", but that should always be true. I was able to convert the zstd patch to this new API with no issue. -- Justin
Hi, I looked at this again, and I realized I misunderstood the bit about errno in LZ4File_open_write a bit. I now see it simply just brings the function in line with Gzip_open_write(), so that the callers can just do pg_fatal("%m"). I still think the special "errno" handling in this one place feels a bit random, and handling it by get_error_func() would be nicer, but we can leave that for a separate patch - no need to block these changes because of that. So pushed all three parts, after updating the commit messages a bit. This leaves the empty-data issue (which we have a fix for) and the switch to LZ4F. And then the zstd part. On 3/20/23 23:40, Justin Pryzby wrote: > On Fri, Mar 17, 2023 at 03:43:58PM +0000, gkokolatos@pm.me wrote: >> From a174cdff4ec8aad59f5bcc7e8d52218a14fe56fc Mon Sep 17 00:00:00 2001 >> From: Georgios Kokolatos <gkokolatos@pm.me> >> Date: Fri, 17 Mar 2023 14:45:58 +0000 >> Subject: [PATCH v3 1/3] Improve type handling in pg_dump's compress file API > >> -int >> +bool >> EndCompressFileHandle(CompressFileHandle *CFH) >> { >> - int ret = 0; >> + bool ret = 0; > > Should say "= false" ? > Right, fixed. >> /* >> * Write 'size' bytes of data into the file from 'ptr'. >> + * >> + * Returns true on success and false on error. >> + */ >> + bool (*write_func) (const void *ptr, size_t size, > >> - * Get a pointer to a string that describes an error that occurred during a >> - * compress file handle operation. >> + * Get a pointer to a string that describes an error that occurred during >> + * a compress file handle operation. >> */ >> const char *(*get_error_func) (CompressFileHandle *CFH); > > This should mention that the error accessible in error_func() applies (only) to > write_func() ? > > As long as this touches pg_backup_directory.c you could update the > header comment to refer to "compressed extensions", not just .gz. > > I noticed that EndCompressorLZ4() tests "if (LZ4cs)", but that should > always be true. > I haven't done these two things. We can/should do that, but it didn't fit into the three patches. > I was able to convert the zstd patch to this new API with no issue. > Good to hear. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
------- Original Message ------- On Thursday, March 23rd, 2023 at 6:10 PM, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > So pushed all three parts, after updating the commit messages a bit. Thank you very much. > > This leaves the empty-data issue (which we have a fix for) and the > switch to LZ4F. And then the zstd part. Please expect promptly a patch for the switch to frames. Cheers, //Georgios
------- Original Message ------- On Thursday, March 16th, 2023 at 11:30 PM, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > > > > On 3/16/23 01:20, Justin Pryzby wrote: > > > On Mon, Mar 13, 2023 at 10:47:12PM +0100, Tomas Vondra wrote: > > > > > > > > Thanks. I don't want to annoy you too much, but could you split the > > > patch into the "empty-data" fix and all the other changes (rearranging > > > functions etc.)? I'd rather not mix those in the same commit. > > > > I don't know if that makes sense? The "empty-data" fix creates a new > > function called DeflateCompressorInit(). My proposal was to add the new > > function in the same place in the file as it used to be. > > > Got it. In that case I agree it's fine to do that in a single commit. For what is worth, I think that this patch should get a +1 and get in. It solves the empty writes problem and includes a test to a previous untested case. Cheers, //Georgios > > > The patch also moves the pg_fatal() that's being removed. I don't think > > it's going to look any cleaner to read a history involving the > > pg_fatal() first being added, then moved, then removed. Anyway, I'll > > wait while the community continues discussion about the pg_fatal(). > > > I think the agreement was to replace the pg_fatal with and assert, and I > see your patch already does that. > > > regards > > -- > Tomas Vondra > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company
------- Original Message ------- On Friday, March 24th, 2023 at 10:30 AM, gkokolatos@pm.me <gkokolatos@pm.me> wrote: > > ------- Original Message ------- > On Thursday, March 23rd, 2023 at 6:10 PM, Tomas Vondra tomas.vondra@enterprisedb.com wrote: > > > This leaves the empty-data issue (which we have a fix for) and the > > switch to LZ4F. And then the zstd part. > > Please expect promptly a patch for the switch to frames. Please find the expected patch attached. Note that the bulk of the patch is code unification, variable renaming to something more appropriate, and comment addition. These are changes that are not strictly necessary to switch to LZ4F. I do believe that are essential for code hygiene after the switch and they do belong on the same commit. Cheers, //Georgios > > Cheers, > //Georgios
Вложения
On 3/28/23 18:07, gkokolatos@pm.me wrote: > > > > > > ------- Original Message ------- > On Friday, March 24th, 2023 at 10:30 AM, gkokolatos@pm.me <gkokolatos@pm.me> wrote: > >> >> ------- Original Message ------- >> On Thursday, March 23rd, 2023 at 6:10 PM, Tomas Vondra tomas.vondra@enterprisedb.com wrote: >> >>> This leaves the empty-data issue (which we have a fix for) and the >>> switch to LZ4F. And then the zstd part. >> >> Please expect promptly a patch for the switch to frames. > > Please find the expected patch attached. Note that the bulk of the > patch is code unification, variable renaming to something more > appropriate, and comment addition. These are changes that are not > strictly necessary to switch to LZ4F. I do believe that are > essential for code hygiene after the switch and they do belong > on the same commit. > Thanks! I agree the renames & cleanup are appropriate - it'd be silly to stick to misleading naming etc. Would it make sense to split the patch into two, to separate the renames and the switch to lz4f? That'd make it the changes necessary for lz4f switch clearer. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Mar 28, 2023 at 06:40:03PM +0200, Tomas Vondra wrote: > On 3/28/23 18:07, gkokolatos@pm.me wrote: > > ------- Original Message ------- > > On Friday, March 24th, 2023 at 10:30 AM, gkokolatos@pm.me <gkokolatos@pm.me> wrote: > > > >> ------- Original Message ------- > >> On Thursday, March 23rd, 2023 at 6:10 PM, Tomas Vondra tomas.vondra@enterprisedb.com wrote: > >> > >>> This leaves the empty-data issue (which we have a fix for) and the > >>> switch to LZ4F. And then the zstd part. > >> > >> Please expect promptly a patch for the switch to frames. > > > > Please find the expected patch attached. Note that the bulk of the > > patch is code unification, variable renaming to something more > > appropriate, and comment addition. These are changes that are not > > strictly necessary to switch to LZ4F. I do believe that are > > essential for code hygiene after the switch and they do belong > > on the same commit. > > Thanks! > > I agree the renames & cleanup are appropriate - it'd be silly to stick > to misleading naming etc. Would it make sense to split the patch into > two, to separate the renames and the switch to lz4f? > That'd make it the changes necessary for lz4f switch clearer. I don't think so. Did you mean separate commits only for review ? The patch is pretty readable - the File API has just some renames, and the compressor API is what's being replaced, which isn't going to be any more clear. @Georgeos: did you consider using a C union in LZ4State, to separate the parts used by the different APIs ? -- Justin
On 3/28/23 18:07, gkokolatos@pm.me wrote: > > ------- Original Message ------- > On Friday, March 24th, 2023 at 10:30 AM, gkokolatos@pm.me <gkokolatos@pm.me> wrote: > >> >> ------- Original Message ------- >> On Thursday, March 23rd, 2023 at 6:10 PM, Tomas Vondra tomas.vondra@enterprisedb.com wrote: >> >>> This leaves the empty-data issue (which we have a fix for) and the >>> switch to LZ4F. And then the zstd part. >> >> Please expect promptly a patch for the switch to frames. > > Please find the expected patch attached. Note that the bulk of the > patch is code unification, variable renaming to something more > appropriate, and comment addition. These are changes that are not > strictly necessary to switch to LZ4F. I do believe that are > essential for code hygiene after the switch and they do belong > on the same commit. > I think the patch is fine, but I'm wondering if the renames shouldn't go a bit further. It removes references to LZ4File struct, but there's a bunch of functions with LZ4File_ prefix. Why not to simply use LZ4_ prefix? We don't have GzipFile either. Sure, it might be a bit confusing because lz4.h uses LZ4_ prefix, but then we probably should not define LZ4_compressor_init ... Also, maybe the comments shouldn't use "File API" when compress_io.c calls that "Compressed stream API". regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 3/28/23 00:34, gkokolatos@pm.me wrote: > > ... > >> Got it. In that case I agree it's fine to do that in a single commit. > > For what is worth, I think that this patch should get a +1 and get in. It > solves the empty writes problem and includes a test to a previous untested > case. > Pushed, after updating / rewording the commit message a little bit. Thanks! -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
------- Original Message ------- On Wednesday, March 29th, 2023 at 12:02 AM, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > > On 3/28/23 18:07, gkokolatos@pm.me wrote: > > > ------- Original Message ------- > > On Friday, March 24th, 2023 at 10:30 AM, gkokolatos@pm.me gkokolatos@pm.me wrote: > > > > > ------- Original Message ------- > > > On Thursday, March 23rd, 2023 at 6:10 PM, Tomas Vondra tomas.vondra@enterprisedb.com wrote: > > > > > > > This leaves the empty-data issue (which we have a fix for) and the > > > > switch to LZ4F. And then the zstd part. > > > > > > Please expect promptly a patch for the switch to frames. > > > > Please find the expected patch attached. Note that the bulk of the > > patch is code unification, variable renaming to something more > > appropriate, and comment addition. These are changes that are not > > strictly necessary to switch to LZ4F. I do believe that are > > essential for code hygiene after the switch and they do belong > > on the same commit. > > > I think the patch is fine, but I'm wondering if the renames shouldn't go > a bit further. It removes references to LZ4File struct, but there's a > bunch of functions with LZ4File_ prefix. Why not to simply use LZ4_ > prefix? We don't have GzipFile either. > > Sure, it might be a bit confusing because lz4.h uses LZ4_ prefix, but > then we probably should not define LZ4_compressor_init ... This is a good point. The initial thought was that since lz4.h is now removed, such ambiguity will not be present. In v2 of the patch the function is renamed to `LZ4State_compression_init` since this name describes better its purpose. It initializes the LZ4State for compression. As for the LZ4File_ prefix, I have no objections. Please find the prefix changed to LZ4Stream_. For the record, the word 'File' is not unique to the lz4 implementation. The common data structure used by the API in compress_io.h: typedef struct CompressFileHandle CompressFileHandle; The public functions for this API are named: InitCompressFileHandle InitDiscoverCompressFileHandle EndCompressFileHandle And within InitCompressFileHandle the pattern is: if (compression_spec.algorithm == PG_COMPRESSION_NONE) InitCompressFileHandleNone(CFH, compression_spec); else if (compression_spec.algorithm == PG_COMPRESSION_GZIP) InitCompressFileHandleGzip(CFH, compression_spec); else if (compression_spec.algorithm == PG_COMPRESSION_LZ4) InitCompressFileHandleLZ4(CFH, compression_spec); It was felt that a prefix was required due to the inclusion 'lz4.h' header where naming functions as 'LZ4_' would be wrong. The prefix 'LZ4File_' seemed to be in line with the naming of the rest of the relevant functions and structures. Other compressions, gzip and none, did not face the same issue. To conclude, I think that having a prefix is slightly preferred over not having one. Since the prefix `LZ4File_` is not desired, I propose `LZ4Stream_` in v2. I will not object to dismissing the argument and drop `File` from the prefix, if so requested. > > Also, maybe the comments shouldn't use "File API" when compress_io.c > calls that "Compressed stream API". Done. Cheers, //Georgios > > > regards > > -- > Tomas Vondra > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company
Вложения
On 3/31/23 11:19, gkokolatos@pm.me wrote: > >> ... >> >> >> I think the patch is fine, but I'm wondering if the renames shouldn't go >> a bit further. It removes references to LZ4File struct, but there's a >> bunch of functions with LZ4File_ prefix. Why not to simply use LZ4_ >> prefix? We don't have GzipFile either. >> >> Sure, it might be a bit confusing because lz4.h uses LZ4_ prefix, but >> then we probably should not define LZ4_compressor_init ... > > This is a good point. The initial thought was that since lz4.h is now > removed, such ambiguity will not be present. In v2 of the patch the > function is renamed to `LZ4State_compression_init` since this name > describes better its purpose. It initializes the LZ4State for > compression. > > As for the LZ4File_ prefix, I have no objections. Please find the > prefix changed to LZ4Stream_. For the record, the word 'File' is not > unique to the lz4 implementation. The common data structure used by > the API in compress_io.h: > > typedef struct CompressFileHandle CompressFileHandle; > > The public functions for this API are named: > > InitCompressFileHandle > InitDiscoverCompressFileHandle > EndCompressFileHandle > > And within InitCompressFileHandle the pattern is: > > if (compression_spec.algorithm == PG_COMPRESSION_NONE) > InitCompressFileHandleNone(CFH, compression_spec); > else if (compression_spec.algorithm == PG_COMPRESSION_GZIP) > InitCompressFileHandleGzip(CFH, compression_spec); > else if (compression_spec.algorithm == PG_COMPRESSION_LZ4) > InitCompressFileHandleLZ4(CFH, compression_spec); > > It was felt that a prefix was required due to the inclusion 'lz4.h' > header where naming functions as 'LZ4_' would be wrong. The prefix > 'LZ4File_' seemed to be in line with the naming of the rest of > the relevant functions and structures. Other compressions, gzip and > none, did not face the same issue. > > To conclude, I think that having a prefix is slightly preferred > over not having one. Since the prefix `LZ4File_` is not desired, > I propose `LZ4Stream_` in v2. > > I will not object to dismissing the argument and drop `File` from > the prefix, if so requested. > Thanks. I think the LZ4Stream prefix is reasonable, so let's roll with that. I cleaned up the patch a little bit (mostly comment tweaks, etc.), updated the commit message and pushed it. The main tweak I did is renaming all the LZ4State variables from "fs" to state. The old name referred to the now abandoned "file state", but after the rename to LZ4State that seems confusing. Some of the places already used "state"and it's easier to know "state" is always LZ4State, so let's keep it consistent. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Feb 27, 2023 at 02:33:04PM +0000, gkokolatos@pm.me wrote: > > > - Finally, the "Nothing to do in the default case" comment comes from > > > Michael's commit 5e73a6048: > > > > > > + /* > > > + * Custom and directory formats are compressed by default with gzip when > > > + * available, not the others. > > > + / > > > + if ((archiveFormat == archCustom || archiveFormat == archDirectory) && > > > + !user_compression_defined) > > > { > > > #ifdef HAVE_LIBZ > > > - if (archiveFormat == archCustom || archiveFormat == archDirectory) > > > - compressLevel = Z_DEFAULT_COMPRESSION; > > > - else > > > + parse_compress_specification(PG_COMPRESSION_GZIP, NULL, > > > + &compression_spec); > > > +#else > > > + / Nothing to do in the default case */ > > > #endif > > > - compressLevel = 0; > > > } > > > > > > As the comment says: for -Fc and -Fd, the compression is set to zlib, if > > > enabled, and when not otherwise specified by the user. > > > > > > Before 5e73a6048, this set compressLevel=0 for -Fp and -Ft, and when > > > zlib was unavailable. > > > > > > But I'm not sure why there's now an empty "#else". I also don't know > > > what "the default case" refers to. > > > > > > Maybe the best thing here is to move the preprocessor #if, since it's no > > > longer in the middle of a runtime conditional: > > > > > > #ifdef HAVE_LIBZ > > > + if ((archiveFormat == archCustom || archiveFormat == archDirectory) && > > > + !user_compression_defined) > > > + parse_compress_specification(PG_COMPRESSION_GZIP, NULL, > > > + &compression_spec); > > > #endif > > > > > > ...but that elicits a warning about "variable set but not used"... > > > > > > Not sure, I need to think about this a bit. > /* Nothing to do for the default case when LIBZ is not available */ > is easier to understand. Maybe I would write it as: "if zlib is unavailable, default to no compression". But I think that's best done in the leading comment, and not inside an empty preprocessor #else. I was hoping Michael would comment on this. The placement and phrasing of the comment makes no sense to me. -- Justin
On Tue, Apr 11, 2023 at 07:41:11PM -0500, Justin Pryzby wrote: > Maybe I would write it as: "if zlib is unavailable, default to no > compression". But I think that's best done in the leading comment, and > not inside an empty preprocessor #else. > > I was hoping Michael would comment on this. (Sorry for the late reply, somewhat missed that.) > The placement and phrasing of the comment makes no sense to me. Yes, this comment gives no value as it stands. I would be tempted to follow the suggestion to group the whole code block in a single ifdef, including the check, and remove this comment. Like the attached perhaps? -- Michael
Вложения
On Wed, Apr 12, 2023 at 10:07:08AM +0900, Michael Paquier wrote: > On Tue, Apr 11, 2023 at 07:41:11PM -0500, Justin Pryzby wrote: > > Maybe I would write it as: "if zlib is unavailable, default to no > > compression". But I think that's best done in the leading comment, and > > not inside an empty preprocessor #else. > > > > I was hoping Michael would comment on this. > > (Sorry for the late reply, somewhat missed that.) > > > The placement and phrasing of the comment makes no sense to me. > > Yes, this comment gives no value as it stands. I would be tempted to > follow the suggestion to group the whole code block in a single ifdef, > including the check, and remove this comment. Like the attached > perhaps? +1
On Tue, Apr 11, 2023 at 08:19:59PM -0500, Justin Pryzby wrote: > On Wed, Apr 12, 2023 at 10:07:08AM +0900, Michael Paquier wrote: >> Yes, this comment gives no value as it stands. I would be tempted to >> follow the suggestion to group the whole code block in a single ifdef, >> including the check, and remove this comment. Like the attached >> perhaps? > > +1 Let me try this one again, as the previous patch would cause a warning under --without:-zlib as user_compression_defined would be unused. We could do something like the attached instead. It means doing twice parse_compress_specification() for the non-zlib path, still we are already doing so for the zlib path. If there are other ideas, feel free. -- Michael
Вложения
On Thu, Apr 13, 2023 at 07:23:48AM +0900, Michael Paquier wrote: > On Tue, Apr 11, 2023 at 08:19:59PM -0500, Justin Pryzby wrote: > > On Wed, Apr 12, 2023 at 10:07:08AM +0900, Michael Paquier wrote: > >> Yes, this comment gives no value as it stands. I would be tempted to > >> follow the suggestion to group the whole code block in a single ifdef, > >> including the check, and remove this comment. Like the attached > >> perhaps? > > > > +1 > > Let me try this one again, as the previous patch would cause a warning > under --without:-zlib as user_compression_defined would be unused. We > could do something like the attached instead. It means doing twice > parse_compress_specification() for the non-zlib path, still we are > already doing so for the zlib path. > > If there are other ideas, feel free. I don't think you need to call parse_compress_specification(NONE). As you wrote it, if zlib is unavailable, there's no parse(NONE) call, even for directory and custom formats. And there's no parse(NONE) call for plan format when zlib is available. The old way had preprocessor #if around both the "if" and "else" - is that what you meant ? If you don't insist on calling parse(NONE), the only change is to remove the empty #else, which was my original patch. "if no compression specification has been specified" is redundant with "by default", and causes "not the others" to dangle. If I were to rewrite the comment, it'd say: + * When gzip is available, custom and directory formats are compressed by + * default
On Wed, Apr 12, 2023 at 05:52:40PM -0500, Justin Pryzby wrote: > I don't think you need to call parse_compress_specification(NONE). > As you wrote it, if zlib is unavailable, there's no parse(NONE) call, > even for directory and custom formats. And there's no parse(NONE) call > for plan format when zlib is available. Yeah, that's not necessary, but I was wondering if it made the code a bit cleaner, or else the non-zlib path would rely on the default compression method string. > The old way had preprocessor #if around both the "if" and "else" - is > that what you meant? > > If you don't insist on calling parse(NONE), the only change is to remove > the empty #else, which was my original patch. Removing the empty else has as problem to create an empty if block, which could be itself a cause of warnings? > If I were to rewrite the comment, it'd say: > > + * When gzip is available, custom and directory formats are compressed by > + * default Okay. -- Michael
Вложения
On Thu, Apr 13, 2023 at 09:37:06AM +0900, Michael Paquier wrote: > > If you don't insist on calling parse(NONE), the only change is to remove > > the empty #else, which was my original patch. > > Removing the empty else has as problem to create an empty if block, > which could be itself a cause of warnings? I doubt it - in the !HAVE_LIBZ case, it's currently an "if" statement with nothing but a comment, which isn't a problem. I think the only issue with an empty "if" is when you have no braces, like: if (...) #if ... something; #endif // problem here // -- Justin
On Wed, Apr 12, 2023 at 07:53:53PM -0500, Justin Pryzby wrote: > I doubt it - in the !HAVE_LIBZ case, it's currently an "if" statement > with nothing but a comment, which isn't a problem. > > I think the only issue with an empty "if" is when you have no braces, > like: > > if (...) > #if ... > something; > #endif > > // problem here // (My apologies for the late reply.) Still it could be easily messed up, and that's not a style that really exists in the tree, either, because there are always #else blocks set up in such cases. Another part that makes me a bit uncomfortable is that we would still call twice parse_compress_specification(), something that should not happen but we are doing so on HEAD because the default compression_algorithm_str is "none" and we want to enforce "gzip" for custom and directory formats when building with zlib. What about just moving this block a bit up, just before the compression spec parsing, then? If we set compression_algorithm_str, the specification is compiled with the expected default, once instead of twice. -- Michael
Вложения
------- Original Message ------- On Tuesday, April 25th, 2023 at 8:02 AM, Michael Paquier <michael@paquier.xyz> wrote: > > > On Wed, Apr 12, 2023 at 07:53:53PM -0500, Justin Pryzby wrote: > > > I doubt it - in the !HAVE_LIBZ case, it's currently an "if" statement > > with nothing but a comment, which isn't a problem. > > > > I think the only issue with an empty "if" is when you have no braces, > > like: > > > > if (...) > > #if ... > > something; > > #endif > > > > // problem here // > > > (My apologies for the late reply.) > > Still it could be easily messed up, and that's not a style that > really exists in the tree, either, because there are always #else > blocks set up in such cases. Another part that makes me a bit > uncomfortable is that we would still call twice > parse_compress_specification(), something that should not happen but > we are doing so on HEAD because the default compression_algorithm_str > is "none" and we want to enforce "gzip" for custom and directory > formats when building with zlib. > > What about just moving this block a bit up, just before the > compression spec parsing, then? If we set compression_algorithm_str, > the specification is compiled with the expected default, once instead > of twice. For what is worth, I think this would be the best approach. +1 Cheers, //Georgios > -- > Michael
On Wed, Apr 26, 2023 at 08:50:46AM +0000, gkokolatos@pm.me wrote: > For what is worth, I think this would be the best approach. +1 Thanks. I have gone with that, then! -- Michael
Вложения
23.03.2023 20:10, Tomas Vondra wrote: > So pushed all three parts, after updating the commit messages a bit. > > This leaves the empty-data issue (which we have a fix for) and the > switch to LZ4F. And then the zstd part. > I'm sorry that I haven't noticed/checked that before, but when trying to perform check-world with Valgrind I've discovered another issue presumably related to LZ4File_gets(). When running under Valgrind: PROVE_TESTS=t/002_pg_dump.pl make check -C src/bin/pg_dump/ I get: ... [07:07:11.683](0.000s) ok 1939 - compression_lz4_dir: glob check for .../src/bin/pg_dump/tmp_check/tmp_test_HB6A/compression_lz4_dir/*.dat.lz4 # Running: pg_restore --jobs=2 --file=.../src/bin/pg_dump/tmp_check/tmp_test_HB6A/compression_lz4_dir.sql .../src/bin/pg_dump/tmp_check/tmp_test_HB6A/compression_lz4_dir ==00:00:00:00.579 2811926== Conditional jump or move depends on uninitialised value(s) ==00:00:00:00.579 2811926== at 0x4853376: rawmemchr (vg_replace_strmem.c:1548) ==00:00:00:00.579 2811926== by 0x4C96A67: _IO_str_init_static_internal (strops.c:41) ==00:00:00:00.579 2811926== by 0x4C693A2: _IO_strfile_read (strfile.h:95) ==00:00:00:00.579 2811926== by 0x4C693A2: __isoc99_sscanf (isoc99_sscanf.c:28) ==00:00:00:00.579 2811926== by 0x11DB6F: _LoadLOs (pg_backup_directory.c:458) ==00:00:00:00.579 2811926== by 0x11DD1E: _PrintTocData (pg_backup_directory.c:422) ==00:00:00:00.579 2811926== by 0x118484: restore_toc_entry (pg_backup_archiver.c:882) ==00:00:00:00.579 2811926== by 0x1190CC: RestoreArchive (pg_backup_archiver.c:699) ==00:00:00:00.579 2811926== by 0x10F25D: main (pg_restore.c:414) ==00:00:00:00.579 2811926== ... It looks like the line variable returned by gets_func() here is not null-terminated: while ((CFH->gets_func(line, MAXPGPATH, CFH)) != NULL) { ... if (sscanf(line, "%u %" CppAsString2(MAXPGPATH) "s\n", &oid, lofname) != 2) ... And Valgrind doesn't like it. Best regards, Alexander
------- Original Message ------- On Friday, May 5th, 2023 at 8:00 AM, Alexander Lakhin <exclusion@gmail.com> wrote: > > > 23.03.2023 20:10, Tomas Vondra wrote: > > > So pushed all three parts, after updating the commit messages a bit. > > > > This leaves the empty-data issue (which we have a fix for) and the > > switch to LZ4F. And then the zstd part. > > > I'm sorry that I haven't noticed/checked that before, but when trying to > perform check-world with Valgrind I've discovered another issue presumably > related to LZ4File_gets(). > When running under Valgrind: > PROVE_TESTS=t/002_pg_dump.pl make check -C src/bin/pg_dump/ > I get: > ... > 07:07:11.683 ok 1939 - compression_lz4_dir: glob check for > .../src/bin/pg_dump/tmp_check/tmp_test_HB6A/compression_lz4_dir/*.dat.lz4 > # Running: pg_restore --jobs=2 --file=.../src/bin/pg_dump/tmp_check/tmp_test_HB6A/compression_lz4_dir.sql > .../src/bin/pg_dump/tmp_check/tmp_test_HB6A/compression_lz4_dir > > ==00:00:00:00.579 2811926== Conditional jump or move depends on uninitialised value(s) > ==00:00:00:00.579 2811926== at 0x4853376: rawmemchr (vg_replace_strmem.c:1548) > ==00:00:00:00.579 2811926== by 0x4C96A67: _IO_str_init_static_internal (strops.c:41) > ==00:00:00:00.579 2811926== by 0x4C693A2: _IO_strfile_read (strfile.h:95) > ==00:00:00:00.579 2811926== by 0x4C693A2: __isoc99_sscanf (isoc99_sscanf.c:28) > ==00:00:00:00.579 2811926== by 0x11DB6F: _LoadLOs (pg_backup_directory.c:458) > ==00:00:00:00.579 2811926== by 0x11DD1E: _PrintTocData (pg_backup_directory.c:422) > ==00:00:00:00.579 2811926== by 0x118484: restore_toc_entry (pg_backup_archiver.c:882) > ==00:00:00:00.579 2811926== by 0x1190CC: RestoreArchive (pg_backup_archiver.c:699) > ==00:00:00:00.579 2811926== by 0x10F25D: main (pg_restore.c:414) > ==00:00:00:00.579 2811926== > ... > > It looks like the line variable returned by gets_func() here is not > null-terminated: > while ((CFH->gets_func(line, MAXPGPATH, CFH)) != NULL) > > { > ... > if (sscanf(line, "%u %" CppAsString2(MAXPGPATH) "s\n", &oid, lofname) != 2) > ... > And Valgrind doesn't like it. > Valgrind is correct to not like it. LZ4Stream_gets() got modeled after gets() when it should have been modeled after fgets(). Please find a patch attached to address it. Cheers, //Georgios > Best regards, > Alexander
Вложения
------- Original Message ------- On Friday, May 5th, 2023 at 8:00 AM, Alexander Lakhin <exclusion@gmail.com> wrote:23.03.2023 20:10, Tomas Vondra wrote:So pushed all three parts, after updating the commit messages a bit. This leaves the empty-data issue (which we have a fix for) and the switch to LZ4F. And then the zstd part.I'm sorry that I haven't noticed/checked that before, but when trying to perform check-world with Valgrind I've discovered another issue presumably related to LZ4File_gets(). When running under Valgrind: PROVE_TESTS=t/002_pg_dump.pl make check -C src/bin/pg_dump/ I get: ... 07:07:11.683 ok 1939 - compression_lz4_dir: glob check for .../src/bin/pg_dump/tmp_check/tmp_test_HB6A/compression_lz4_dir/*.dat.lz4 # Running: pg_restore --jobs=2 --file=.../src/bin/pg_dump/tmp_check/tmp_test_HB6A/compression_lz4_dir.sql .../src/bin/pg_dump/tmp_check/tmp_test_HB6A/compression_lz4_dir ==00:00:00:00.579 2811926== Conditional jump or move depends on uninitialised value(s) ==00:00:00:00.579 2811926== at 0x4853376: rawmemchr (vg_replace_strmem.c:1548) ==00:00:00:00.579 2811926== by 0x4C96A67: _IO_str_init_static_internal (strops.c:41) ==00:00:00:00.579 2811926== by 0x4C693A2: _IO_strfile_read (strfile.h:95) ==00:00:00:00.579 2811926== by 0x4C693A2: __isoc99_sscanf (isoc99_sscanf.c:28) ==00:00:00:00.579 2811926== by 0x11DB6F: _LoadLOs (pg_backup_directory.c:458) ==00:00:00:00.579 2811926== by 0x11DD1E: _PrintTocData (pg_backup_directory.c:422) ==00:00:00:00.579 2811926== by 0x118484: restore_toc_entry (pg_backup_archiver.c:882) ==00:00:00:00.579 2811926== by 0x1190CC: RestoreArchive (pg_backup_archiver.c:699) ==00:00:00:00.579 2811926== by 0x10F25D: main (pg_restore.c:414) ==00:00:00:00.579 2811926== ... It looks like the line variable returned by gets_func() here is not null-terminated: while ((CFH->gets_func(line, MAXPGPATH, CFH)) != NULL) { ... if (sscanf(line, "%u %" CppAsString2(MAXPGPATH) "s\n", &oid, lofname) != 2) ... And Valgrind doesn't like it.Valgrind is correct to not like it. LZ4Stream_gets() got modeled after gets() when it should have been modeled after fgets(). Please find a patch attached to address it.
Isn't using memset here a bit wasteful? Why not just put a null at the end after calling LZ4Stream_read_internal(), which tells you how many bytes it has written?
cheers
andrew
-- Andrew Dunstan EDB: https://www.enterprisedb.com
On Friday, May 5th, 2023 at 3:23 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
On 2023-05-05 Fr 06:02, gkokolatos@pm.me wrote:------- Original Message ------- On Friday, May 5th, 2023 at 8:00 AM, Alexander Lakhin <exclusion@gmail.com> wrote:23.03.2023 20:10, Tomas Vondra wrote:So pushed all three parts, after updating the commit messages a bit. This leaves the empty-data issue (which we have a fix for) and the switch to LZ4F. And then the zstd part.I'm sorry that I haven't noticed/checked that before, but when trying to perform check-world with Valgrind I've discovered another issue presumably related to LZ4File_gets(). When running under Valgrind: PROVE_TESTS=t/002_pg_dump.pl make check -C src/bin/pg_dump/ I get: ... 07:07:11.683 ok 1939 - compression_lz4_dir: glob check for .../src/bin/pg_dump/tmp_check/tmp_test_HB6A/compression_lz4_dir/*.dat.lz4 # Running: pg_restore --jobs=2 --file=.../src/bin/pg_dump/tmp_check/tmp_test_HB6A/compression_lz4_dir.sql .../src/bin/pg_dump/tmp_check/tmp_test_HB6A/compression_lz4_dir ==00:00:00:00.579 2811926== Conditional jump or move depends on uninitialised value(s) ==00:00:00:00.579 2811926== at 0x4853376: rawmemchr (vg_replace_strmem.c:1548) ==00:00:00:00.579 2811926== by 0x4C96A67: _IO_str_init_static_internal (strops.c:41) ==00:00:00:00.579 2811926== by 0x4C693A2: _IO_strfile_read (strfile.h:95) ==00:00:00:00.579 2811926== by 0x4C693A2: __isoc99_sscanf (isoc99_sscanf.c:28) ==00:00:00:00.579 2811926== by 0x11DB6F: _LoadLOs (pg_backup_directory.c:458) ==00:00:00:00.579 2811926== by 0x11DD1E: _PrintTocData (pg_backup_directory.c:422) ==00:00:00:00.579 2811926== by 0x118484: restore_toc_entry (pg_backup_archiver.c:882) ==00:00:00:00.579 2811926== by 0x1190CC: RestoreArchive (pg_backup_archiver.c:699) ==00:00:00:00.579 2811926== by 0x10F25D: main (pg_restore.c:414) ==00:00:00:00.579 2811926== ... It looks like the line variable returned by gets_func() here is not null-terminated: while ((CFH->gets_func(line, MAXPGPATH, CFH)) != NULL) { ... if (sscanf(line, "%u %" CppAsString2(MAXPGPATH) "s\n", &oid, lofname) != 2) ... And Valgrind doesn't like it.Valgrind is correct to not like it. LZ4Stream_gets() got modeled after gets() when it should have been modeled after fgets(). Please find a patch attached to address it.
Isn't using memset here a bit wasteful? Why not just put a null at the end after calling LZ4Stream_read_internal(), which tells you how many bytes it has written?
Good point. I thought about it before submitting the patch. I concluded that given the complexity and operations involved in LZ4Stream_read_internal() and the rest of the pg_dump/pg_restore code, the memset() call will be negligible. However from the readability point of view, the function is a bit cleaner with the memset().
I will not object to any suggestion though, as this is a very trivial point. Please find attached a v2 of the patch following the suggested approach.
Cheers,
//Georgios
cheers
andrew
-- Andrew Dunstan EDB: https://www.enterprisedb.com
Вложения
On Fri, May 05, 2023 at 02:13:28PM +0000, gkokolatos@pm.me wrote: > Good point. I thought about it before submitting the patch. I > concluded that given the complexity and operations involved in > LZ4Stream_read_internal() and the rest of t he pg_dump/pg_restore > code, the memset() call will be negligible. However from the > readability point of view, the function is a bit cleaner with the > memset(). > > I will not object to any suggestion though, as this is a very > trivial point. Please find attached a v2 of the patch following the > suggested approach. Please note that an open item has been added for this stuff. -- Michael
Вложения
On Fri, May 05, 2023 at 02:13:28PM +0000, gkokolatos@pm.me wrote:Thank you but I am not certain I know what that means. Can you please explain?
> Good point. I thought about it before submitting the patch. I
> concluded that given the complexity and operations involved in
> LZ4Stream_read_internal() and the rest of t he pg_dump/pg_restore
> code, the memset() call will be negligible. However from the
> readability point of view, the function is a bit cleaner with the
> memset().
>
> I will not object to any suggestion though, as this is a very
> trivial point. Please find attached a v2 of the patch following the
> suggested approach.
Please note that an open item has been added for this stuff.
--
Michael
On Sun, May 07, 2023 at 03:01:52PM +0000, gkokolatos@pm.me wrote: > Thank you but I am not certain I know what that means. Can you please explain? It means that this thread has been added to the following list: https://wiki.postgresql.org/wiki/PostgreSQL_16_Open_Items#Open_Issues pg_dump/compress_lz4.c is new as of PostgreSQL 16, and this patch is fixing a deficiency. That's just a way outside of the commit fest to track any problems and make sure these are fixed before the release happens. -- Michael
Вложения
On Fri, May 05, 2023 at 02:13:28PM +0000, gkokolatos@pm.me wrote: > Good point. I thought about it before submitting the patch. I > concluded that given the complexity and operations involved in > LZ4Stream_read_internal() and the rest of t he pg_dump/pg_restore > code, the memset() call will be negligible. However from the > readability point of view, the function is a bit cleaner with the > memset(). > > I will not object to any suggestion though, as this is a very > trivial point. Please find attached a v2 of the patch following the > suggested approach. Hmm. I was looking at this patch, and what you are trying to do sounds rather right to keep a parallel with the gzip and zstd code paths. Looking at the code of gzread.c, gzgets() enforces a null-termination on the string read. Still, isn't that something we'd better enforce in read_none() as well? compress_io.h lists this as a requirement of the callback, and Zstd_gets() does so already. read_none() does not enforce that, unfortunately. + /* No work needs to be done for a zero-sized output buffer */ + if (size <= 0) + return 0; Indeed. This should be OK. - ret = LZ4Stream_read_internal(state, ptr, size, true); + Assert(size > 1); The addition of this assertion is a bit surprising, and this is inconsistent with Zstd_gets where a length of 1 is authorized. We should be more consistent across all the callbacks, IMO, not less, so as we apply the same API contract across all the compression methods. While testing this patch, I have triggered an error pointing out that the decompression path of LZ4 is broken for table data. I can reproduce that with a dump of the regression database, as of: make installcheck pg_dump --format=d --file=dump_lz4 --compress=lz4 regression createdb regress_lz4 pg_restore --format=d -d regress_lz4 dump_lz4 pg_restore: error: COPY failed for table "clstr_tst": ERROR: extra data after last expected column CONTEXT: COPY clstr_tst, line 15: "32 6 seis xyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzy..." pg_restore: warning: errors ignored on restore: 1 This does not show up with gzip or zstd, and the patch does not influence the result. In short it shows up with and without the patch, on HEAD. That does not look really stable :/ -- Michael
Вложения
Michael Paquier <michael@paquier.xyz> writes: > While testing this patch, I have triggered an error pointing out that > the decompression path of LZ4 is broken for table data. I can > reproduce that with a dump of the regression database, as of: > make installcheck > pg_dump --format=d --file=dump_lz4 --compress=lz4 regression > createdb regress_lz4 > pg_restore --format=d -d regress_lz4 dump_lz4 > pg_restore: error: COPY failed for table "clstr_tst": ERROR: extra data after last expected column > CONTEXT: COPY clstr_tst, line 15: "32 6 seis xyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzy..." > pg_restore: warning: errors ignored on restore: 1 Ugh. Reproduced here ... so we need an open item for this. regards, tom lane
On Sun, May 07, 2023 at 09:09:25PM -0400, Tom Lane wrote: > Ugh. Reproduced here ... so we need an open item for this. Yep. Already added. -- Michael
Вложения
I wrote: > Michael Paquier <michael@paquier.xyz> writes: >> While testing this patch, I have triggered an error pointing out that >> the decompression path of LZ4 is broken for table data. I can >> reproduce that with a dump of the regression database, as of: >> make installcheck >> pg_dump --format=d --file=dump_lz4 --compress=lz4 regression > Ugh. Reproduced here ... so we need an open item for this. BTW, it seems to work with --format=c. regards, tom lane
On 5/7/23 17:01, gkokolatos@pm.me wrote: > > > > On Sat, May 6, 2023 at 04:51, Michael Paquier <michael@paquier.xyz > <mailto:On Sat, May 6, 2023 at 04:51, Michael Paquier <<a href=>> wrote: >> On Fri, May 05, 2023 at 02:13:28PM +0000, gkokolatos@pm.me wrote: >> > Good point. I thought about it before submitting the patch. I >> > concluded that given the complexity and operations involved in >> > LZ4Stream_read_internal() and the rest of t he pg_dump/pg_restore >> > code, the memset() call will be negligible. However from the >> > readability point of view, the function is a bit cleaner with the >> > memset(). >> > >> > I will not object to any suggestion though, as this is a very >> > trivial point. Please find attached a v2 of the patch following the >> > suggested approach. >> >> Please note that an open item has been added for this stuff. > Thank you but I am not certain I know what that means. Can you please > explain? > It means it was added to the list of items we need to fix before PG16 gets out: https://wiki.postgresql.org/wiki/PostgreSQL_16_Open_Items regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
------- Original Message ------- On Monday, May 8th, 2023 at 3:16 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > I wrote: > > > Michael Paquier michael@paquier.xyz writes: > > > > > While testing this patch, I have triggered an error pointing out that > > > the decompression path of LZ4 is broken for table data. I can > > > reproduce that with a dump of the regression database, as of: > > > make installcheck > > > pg_dump --format=d --file=dump_lz4 --compress=lz4 regression > > > Ugh. Reproduced here ... so we need an open item for this. > > > BTW, it seems to work with --format=c. > Thank you for the extra tests. It seems that exists a gap in the test coverage. Please find a patch attached that is addressing the issue and attempt to provide tests for it. Cheers, //Georgios > regards, tom lane
Вложения
On 5/8/23 03:16, Tom Lane wrote: > I wrote: >> Michael Paquier <michael@paquier.xyz> writes: >>> While testing this patch, I have triggered an error pointing out that >>> the decompression path of LZ4 is broken for table data. I can >>> reproduce that with a dump of the regression database, as of: >>> make installcheck >>> pg_dump --format=d --file=dump_lz4 --compress=lz4 regression > >> Ugh. Reproduced here ... so we need an open item for this. > > BTW, it seems to work with --format=c. > The LZ4Stream_write() forgot to move the pointer to the next chunk, so it was happily decompressing the initial chunk over and over. A bit embarrassing oversight :-( The custom format calls WriteDataToArchiveLZ4(), which was correct. The attached patch fixes this for me. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
On 5/8/23 18:19, gkokolatos@pm.me wrote: > > > > > > ------- Original Message ------- > On Monday, May 8th, 2023 at 3:16 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > >> >> >> I wrote: >> >>> Michael Paquier michael@paquier.xyz writes: >>> >>>> While testing this patch, I have triggered an error pointing out that >>>> the decompression path of LZ4 is broken for table data. I can >>>> reproduce that with a dump of the regression database, as of: >>>> make installcheck >>>> pg_dump --format=d --file=dump_lz4 --compress=lz4 regression >> >>> Ugh. Reproduced here ... so we need an open item for this. >> >> >> BTW, it seems to work with --format=c. >> > > Thank you for the extra tests. It seems that exists a gap in the test > coverage. Please find a patch attached that is addressing the issue > and attempt to provide tests for it. > Seems I'm getting messages with a delay - this is mostly the same fix I ended up with, not realizing you already posted a fix. I don't think we need the local "in" variable - the pointer parameter is local in the function, so we can modify it directly (with a cast). WriteDataToArchiveLZ4 does it that way too. The tests are definitely a good idea. I wonder if we should add a comment to DEFAULT_IO_BUFFER_SIZE mentioning that if we choose to increase the value in the future, we needs to tweak the tests too to use more data in order to exercise the buffering etc. Maybe it's obvious? regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
------- Original Message ------- On Monday, May 8th, 2023 at 8:20 PM, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > > > > On 5/8/23 18:19, gkokolatos@pm.me wrote: > > > ------- Original Message ------- > > On Monday, May 8th, 2023 at 3:16 AM, Tom Lane tgl@sss.pgh.pa.us wrote: > > > > > I wrote: > > > > > > > Michael Paquier michael@paquier.xyz writes: > > > > > > > > > While testing this patch, I have triggered an error pointing out that > > > > > the decompression path of LZ4 is broken for table data. I can > > > > > reproduce that with a dump of the regression database, as of: > > > > > make installcheck > > > > > pg_dump --format=d --file=dump_lz4 --compress=lz4 regression > > > > > > > Ugh. Reproduced here ... so we need an open item for this. > > > > > > BTW, it seems to work with --format=c. > > > > Thank you for the extra tests. It seems that exists a gap in the test > > coverage. Please find a patch attached that is addressing the issue > > and attempt to provide tests for it. > > > Seems I'm getting messages with a delay - this is mostly the same fix I > ended up with, not realizing you already posted a fix. Thank you very much for looking. > I don't think we need the local "in" variable - the pointer parameter is > local in the function, so we can modify it directly (with a cast). > WriteDataToArchiveLZ4 does it that way too. Sure, patch updated. > The tests are definitely a good idea. Thank you. > I wonder if we should add a > comment to DEFAULT_IO_BUFFER_SIZE mentioning that if we choose to > increase the value in the future, we needs to tweak the tests too to use > more data in order to exercise the buffering etc. Maybe it's obvious? > You are right. Added a comment both in the header and in the test. I hope v2 gets closer to closing the open item for this. Cheers, //Georgios > > regards > > -- > Tomas Vondra > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company
Вложения
On Mon, May 08, 2023 at 08:00:39PM +0200, Tomas Vondra wrote: > The LZ4Stream_write() forgot to move the pointer to the next chunk, so > it was happily decompressing the initial chunk over and over. A bit > embarrassing oversight :-( > > The custom format calls WriteDataToArchiveLZ4(), which was correct. > > The attached patch fixes this for me. Ouch. So this was corrupting the dumps and the compression when trying to write more than two chunks at once, not the decompression steps. That addresses the issue here as well, thanks! -- Michael
Вложения
On 5/9/23 00:10, Michael Paquier wrote: > On Mon, May 08, 2023 at 08:00:39PM +0200, Tomas Vondra wrote: >> The LZ4Stream_write() forgot to move the pointer to the next chunk, so >> it was happily decompressing the initial chunk over and over. A bit >> embarrassing oversight :-( >> >> The custom format calls WriteDataToArchiveLZ4(), which was correct. >> >> The attached patch fixes this for me. > > Ouch. So this was corrupting the dumps and the compression when > trying to write more than two chunks at once, not the decompression > steps. That addresses the issue here as well, thanks! Yeah. Thanks for the report, should have been found during review. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
------- Original Message ------- On Tuesday, May 9th, 2023 at 2:54 PM, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > > On 5/9/23 00:10, Michael Paquier wrote: > > > On Mon, May 08, 2023 at 08:00:39PM +0200, Tomas Vondra wrote: > > > > > The LZ4Stream_write() forgot to move the pointer to the next chunk, so > > > it was happily decompressing the initial chunk over and over. A bit > > > embarrassing oversight :-( > > > > > > The custom format calls WriteDataToArchiveLZ4(), which was correct. > > > > > > The attached patch fixes this for me. > > > > Ouch. So this was corrupting the dumps and the compression when > > trying to write more than two chunks at once, not the decompression > > steps. That addresses the issue here as well, thanks! > > > Yeah. Thanks for the report, should have been found during review. Thank you both for looking. A small consolation is that now there are tests for this case. Moving on to the other open item for this, please find attached v2 of the patch as requested. Cheers, //Georgios > > > regards > > -- > Tomas Vondra > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company
Вложения
On Tue, May 09, 2023 at 02:12:44PM +0000, gkokolatos@pm.me wrote: > Thank you both for looking. A small consolation is that now there are > tests for this case. +1, noticing that was pure luck ;) Worth noting that the patch posted in [1] has these tests, not the version posted in [2]. + create_sql => 'INSERT INTO dump_test.test_compression_method (col1) ' + . 'SELECT string_agg(a::text, \'\') FROM generate_series(1,4096) a;', Yep, good and cheap idea to check for longer chunks. That should be enough to loop twice. [1]: https://www.postgresql.org/message-id/SYTRcNgtAbzyn3y3IInh1x-UfNTKMNpnFvI3mr6SyqyVf3PkaDsMy_cpKKgsl3_HdLy2MFAH4zwjxDmFfiLO8rWtSiJWBtqT06OMjeNo4GA=@pm.me [2]: https://www.postgresql.org/message-id/f735df01-0bb4-2fbc-1297-73a520cfc534@enterprisedb.com > Moving on to the other open item for this, please find attached v2 > of the patch as requested. Did you notice the comments of [3] about the second patch that aims to add the null termination in the line from the LZ4 fgets() callback? [3]: https://www.postgresql.org/message-id/ZFhCyn4Gm2eu60rB@paquier.xyz -- Michael
Вложения
On Tue, May 09, 2023 at 02:54:31PM +0200, Tomas Vondra wrote: > Yeah. Thanks for the report, should have been found during review. Tomas, are you planning to do something by the end of this week for beta1? Or do you need some help of any kind? -- Michael
Вложения
On 5/17/23 08:18, Michael Paquier wrote: > On Tue, May 09, 2023 at 02:54:31PM +0200, Tomas Vondra wrote: >> Yeah. Thanks for the report, should have been found during review. > > Tomas, are you planning to do something by the end of this week for > beta1? Or do you need some help of any kind? I'll take care of it. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 5/17/23 10:59, Tomas Vondra wrote: > On 5/17/23 08:18, Michael Paquier wrote: >> On Tue, May 09, 2023 at 02:54:31PM +0200, Tomas Vondra wrote: >>> Yeah. Thanks for the report, should have been found during review. >> >> Tomas, are you planning to do something by the end of this week for >> beta1? Or do you need some help of any kind? > > I'll take care of it. > FWIW I've pushed fixes for both open issues associated with the pg_dump compression. I'll keep an eye on the buildfarm, but hopefully that'll do it for beta1. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company