Обсуждение: Re: pgsql: Don't trust unvalidated xl_tot_len.
Re: Thomas Munro > Don't trust unvalidated xl_tot_len. > src/test/recovery/t/039_end_of_wal.pl | 460 ++++++++++++++++++++++++++++++++ I haven't investigated the details yet, and it's not affecting the builds on apt.postgresql.org, but the Debian amd64 and i386 regression tests just failed this test on PG13 (11 and 15 are ok): t/039_end_of_wal.pl .................. Dubious, test returned 2 (wstat 512, 0x200) No subtests run Test Summary Report ------------------- t/039_end_of_wal.pl (Wstat: 512 Tests: 0 Failed: 0) Non-zero exit status: 2 Parse errors: No plan found in TAP output Files=26, Tests=254, 105 wallclock secs ( 0.10 usr 0.02 sys + 19.58 cusr 11.02 csys = 30.72 CPU) Result: FAIL make[2]: *** [Makefile:23: check] Error 1 https://salsa.debian.org/postgresql/postgresql/-/jobs/4910354 https://salsa.debian.org/postgresql/postgresql/-/jobs/4910355 Christoph
Re: To Thomas Munro > I haven't investigated the details yet, and it's not affecting the > builds on apt.postgresql.org, but the Debian amd64 and i386 regression > tests just failed this test on PG13 (11 and 15 are ok): That's on Debian bullseye, fwiw. (But the 13 build on apt.pg.o/bullseye passed.) Christoph
Re: To Thomas Munro > > src/test/recovery/t/039_end_of_wal.pl | 460 ++++++++++++++++++++++++++++++++ > > I haven't investigated the details yet, and it's not affecting the > builds on apt.postgresql.org, but the Debian amd64 and i386 regression > tests just failed this test on PG13 (11 and 15 are ok): 12 and 14 are also failing, now on Debian unstable. (Again, only in the salsa.debian.org tests, not on apt.postgresql.org's buildds.) 12 amd64: https://salsa.debian.org/postgresql/postgresql/-/jobs/4898857 12 i386: https://salsa.debian.org/postgresql/postgresql/-/jobs/4898858 The tests there are running in Docker containers. Christoph
On Sat, Nov 11, 2023 at 10:42 AM Christoph Berg <myon@debian.org> wrote: > > I haven't investigated the details yet, and it's not affecting the > > builds on apt.postgresql.org, but the Debian amd64 and i386 regression > > tests just failed this test on PG13 (11 and 15 are ok): > > 12 and 14 are also failing, now on Debian unstable. (Again, only in > the salsa.debian.org tests, not on apt.postgresql.org's buildds.) > > 12 amd64: https://salsa.debian.org/postgresql/postgresql/-/jobs/4898857 > 12 i386: https://salsa.debian.org/postgresql/postgresql/-/jobs/4898858 > > The tests there are running in Docker containers. Hmm. regress_log_039_end_of_wal says: No such file or directory at /builds/postgresql/postgresql/debian/output/source_dir/build/../src/test/perl/TestLib.pm line 655. In the 13 branch we see that's in the new scan_server_header() subroutine where it tries to open the header, after asking pg_config --includedir-server where it lives. Hmm...
Re: Thomas Munro > In the 13 branch we see that's in the new scan_server_header() > subroutine where it tries to open the header, after asking pg_config > --includedir-server where it lives. Hmm... It's no ok to use pg_config at test time since `make install` might not have been called yet: https://www.postgresql.org/message-id/2023925.1644591595@sss.pgh.pa.us https://www.postgresql.org/message-id/YqkV/hoi2SX91it8@paquier.xyz Christoph
On Sun, Nov 12, 2023 at 12:53 AM Christoph Berg <myon@debian.org> wrote: > Re: Thomas Munro > > In the 13 branch we see that's in the new scan_server_header() > > subroutine where it tries to open the header, after asking pg_config > > --includedir-server where it lives. Hmm... > > It's no ok to use pg_config at test time since `make install` might > not have been called yet: > > https://www.postgresql.org/message-id/2023925.1644591595@sss.pgh.pa.us > https://www.postgresql.org/message-id/YqkV/hoi2SX91it8@paquier.xyz [CC'ing Michael who was involved in that analysis and who also wrote those bits of this commit] We don't have an installation into the final --prefix, but we have tmp_install, surely? And the tests are run with PATH set to point to tmp_install's bin directory. It looks like it did actually find a pg_config executable because otherwise we'd have hit die "could not execute pg_config" and failed sooner. So now I'm wondering if the pg_config it found gives the wrong answer for --includedir-server, because of Debian's local patches that insert a major version into some paths. For example, maybe it's trying to look for access/xlog_internal.h under tmp_install/usr/include/postgresql/server when it's really under tmp_install/usr/include/postgresql/13/server, or vice versa. But then why does that only happen on the salsa build, not on the apt.postgresql.org one?
Re: Thomas Munro > But then why does that only happen on the salsa build, > not on the apt.postgresql.org one? The build chroots there have postgresql-NN already installed so extension builds don't have to download 7 PG versions over and over. My guess would be that that's the difference and it's using some pg_config from /usr/bin or /usr/lib/postgresql/*/bin. I can confirm that it's also failing in my local chroots if none of the postgresql-* packages are preinstalled. Christoph
On Sun, Nov 12, 2023 at 7:27 AM Christoph Berg <myon@debian.org> wrote: > I can confirm that it's also failing in my local chroots if none of > the postgresql-* packages are preinstalled. In your chroot after it fails, can you please find xlog_internal.h somewhere under tmp_install and tell us the full path, and can you find pg_config (however many of them there might be, I'm a little confused on where and when Debian creates extra versioned variants) and tell us the full path, and also what --includedir-server prints, and can you also find regress_log_039_end_of_wal and confirm that it contains a complaint about being unable to open a file, not a complaint about being unable to execute pg_config?
Hi, On 2023-11-12 07:17:02 +1300, Thomas Munro wrote: > On Sun, Nov 12, 2023 at 12:53 AM Christoph Berg <myon@debian.org> wrote: > > Re: Thomas Munro > > > In the 13 branch we see that's in the new scan_server_header() > > > subroutine where it tries to open the header, after asking pg_config > > > --includedir-server where it lives. Hmm... > > > > It's no ok to use pg_config at test time since `make install` might > > not have been called yet: > > > > https://www.postgresql.org/message-id/2023925.1644591595@sss.pgh.pa.us > > https://www.postgresql.org/message-id/YqkV/hoi2SX91it8@paquier.xyz > > [CC'ing Michael who was involved in that analysis and who also wrote > those bits of this commit] > > We don't have an installation into the final --prefix, but we have > tmp_install, surely? Yea, that should work and does work locally. I guess it'd fail though, if you somehow ran the tests with NO_TEMP_INSTALL=1 or such. > And the tests are run with PATH set to point to > tmp_install's bin directory. It looks like it did actually find a > pg_config executable because otherwise we'd have hit die "could not > execute pg_config" and failed sooner. So now I'm wondering if the > pg_config it found gives the wrong answer for --includedir-server, > because of Debian's local patches that insert a major version into > some paths. From the second link above, the problem might more be that debian pg_config is patched to not be relocatable (huh?) - so it'd return an absolute path into the final non-DESTDIR path. Which would fail, because the file isn't installed yet. If that's the case, does that also mean that all the tests that are selectively enabled using check_pg_config() don't work in the debian context? I assume the reason to not use the source-tree access/xlog_internal.h here was just that it's nontrivial to find the top of the source tree form tap tests? It's not hard to make it available... And as we already pass in top_builddir, it doesn't actually measurably further weaken usability of the tap framework in the pgxs context. > For example, maybe it's trying to look for access/xlog_internal.h under > tmp_install/usr/include/postgresql/server when it's really under > tmp_install/usr/include/postgresql/13/server, or vice versa. But then why > does that only happen on the salsa build, not on the apt.postgresql.org one? Christoph, can you apply a patch to emit a bit more information in that environment? diff --git i/src/test/perl/PostgreSQL/Test/Utils.pm w/src/test/perl/PostgreSQL/Test/Utils.pm index cd86897580c..3c588a41755 100644 --- i/src/test/perl/PostgreSQL/Test/Utils.pm +++ w/src/test/perl/PostgreSQL/Test/Utils.pm @@ -722,7 +722,8 @@ sub scan_server_header chomp($stdout); $stdout =~ s/\r$//; - open my $header_h, '<', "$stdout/$header_path" or die "$!"; + my $fname = "$stdout/$header_path"; + open my $header_h, '<', "$fname" or die "could not open $fname: $!"; my @match = undef; while (<$header_h>) would be helpful. I guess we should also apply something like that to our tree - printing a stringified errno without even a path/filename isn't very useful. Greetings, Andres Freund
That's a lot of questions :). Let me try: > In your chroot after it fails, can you please find xlog_internal.h > somewhere under tmp_install and tell us the full path, and can you ./build/tmp_install/usr/include/postgresql/13/server/access/xlog_internal.h ./src/include/access/xlog_internal.h > find pg_config (however many of them there might be, I'm a little > confused on where and when Debian creates extra versioned variants) That's only after testing and `make install`: https://salsa.debian.org/postgresql/postgresql-common/-/blob/master/server/postgresql.mk#L219-225 > and tell us the full path, ./build/tmp_install/usr/lib/postgresql/13/bin/pg_config ./build/src/bin/pg_config/pg_config > and also what --includedir-server prints, $ ./build/tmp_install/usr/lib/postgresql/13/bin/pg_config --includedir-server /usr/include/postgresql/13/server > and can you also find regress_log_039_end_of_wal and confirm that it > contains a complaint about being unable to open a file, not a > complaint about being unable to execute pg_config? $ cat ./build/src/test/recovery/tmp_check/log/regress_log_039_end_of_wal No such file or directory at /home/myon/projects/postgresql/debian/13/build/../src/test/perl/TestLib.pm line 655. The 13-bullseye version of the package still has the "don't relocate me" patch: https://salsa.debian.org/postgresql/postgresql/-/blob/13-bullseye/debian/patches/50-per-version-dirs.patch?ref_type=heads The PGBINDIR mangling is exactly what is breaking the use case now. The commit that removed that bit in the 15 branch explains why it was there: https://salsa.debian.org/postgresql/postgresql/-/commit/a249c75e86fe8733b11c47630e4931c5c196e8da I can (and should) do the change also in the other branches, but from the 2022 discussion, I had the impression there were more reasons to prefer static paths instead of calling pg_config from tmp_install. After all, this seems to be the only 2nd case of actually calling pg_config from tests if I'm grepping for the right things - the other is check_pg_config() called from test/ssl/t/002_scram.pl. (I wonder why that's not failing as well.) Christoph
On Sun, Nov 12, 2023 at 9:03 AM Christoph Berg <myon@debian.org> wrote: > The PGBINDIR mangling is exactly what is breaking the use case now. > The commit that removed that bit in the 15 branch explains why it was > there: > > https://salsa.debian.org/postgresql/postgresql/-/commit/a249c75e86fe8733b11c47630e4931c5c196e8da > > I can (and should) do the change also in the other branches, but from > the 2022 discussion, I had the impression there were more reasons to > prefer static paths instead of calling pg_config from tmp_install. No opinion on potential advantages to other approaches, but I don't see why this way shouldn't be expected to work. So I hope you can drop that diff. > After all, this seems to be the only 2nd case of actually calling > pg_config from tests if I'm grepping for the right things - the other > is check_pg_config() called from test/ssl/t/002_scram.pl. (I wonder > why that's not failing as well.) Maybe you aren't running the SSL tests?
On Sun, Nov 12, 2023 at 12:17:54PM +1300, Thomas Munro wrote: > No opinion on potential advantages to other approaches, but I don't > see why this way shouldn't be expected to work. So I hope you can > drop that diff. Another thing that could be done in stable branches is just to switch 039_end_of_wal.pl to hardcoded values for $XLP_PAGE_MAGIC and $XLP_FIRST_IS_CONTRECORD. This is not going to change in a released version anyway, so there's no real maintenance cost. -- Michael