Обсуждение: SIGPIPE in TAP tests
Two buildfarm runs[1][2] from the last 90 days have failed in src/test/authentication, like this: t/001_password.pl .. Failed 3/8 subtests t/002_saslprep.pl .. ok 1815 ms ( 0.00 usr 0.00 sys + 0.89 cusr 0.26 csys = 1.15 CPU) Test Summary Report ------------------- t/001_password.pl (Wstat: 13 Tests: 5 Failed: 0) Non-zero wait status: 13 Parse errors: Bad plan. You planned 8 tests but ran 5. Files=2, Tests=17, 3 wallclock secs ( 0.04 usr 0.00 sys + 1.67 cusr 0.50 csys = 2.21 CPU) Result: FAIL Compared to a good run, the other logs just end suddenly after the expected "FATAL: password authentication failed". "Wstat: 13" means the Perl process died to signal 13 (SIGPIPE). This test invokes psql in a way that fails authentication, and it writes "SELECT 1" to psql's stdin. The SIGPIPE happens if the psql process exits before that write. I can reproduce this consistently by patching a delay into IPC::Run: --- IPC/Run.pm~ 2017-05-12 06:46:43.000000000 -0700 +++ IPC/Run.pm 2017-12-09 00:24:06.876132628 -0800 @@ -1449,6 +1449,8 @@ sub run { local $in_run = 1; ## Allow run()-only optimizations. my IPC::Run $self = start(@_); + use Time::HiRes 'usleep'; + usleep 100_000; The two src/test/authentication tests then fail, but nothing else fails. Let's ignore SIGPIPE in all TAP tests, which leaves some evidence in regress_log_001_password: ack Broken pipe: write( 13, 'SELECT 1' ) at /home/nm/sw/cpan/lib/perl5/IPC/Run/IO.pm line 549. To fix the actual failures, we can cease sending "SELECT 1"; it's enough to disconnect immediately. Patch attached. [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=crake&dt=2017-11-23%2008%3A37%3A17 [2] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mandrill&dt=2017-10-30%2000%3A14%3A09
Вложения
On Sun, Dec 10, 2017 at 6:02 AM, Noah Misch <noah@leadboat.com> wrote: > Two buildfarm runs[1][2] from the last 90 days have failed in > src/test/authentication, like this: > > t/001_password.pl .. > Failed 3/8 subtests > t/002_saslprep.pl .. ok 1815 ms ( 0.00 usr 0.00 sys + 0.89 cusr 0.26 csys = 1.15 CPU) > > Test Summary Report > ------------------- > t/001_password.pl (Wstat: 13 Tests: 5 Failed: 0) > Non-zero wait status: 13 > Parse errors: Bad plan. You planned 8 tests but ran 5. > Files=2, Tests=17, 3 wallclock secs ( 0.04 usr 0.00 sys + 1.67 cusr 0.50 csys = 2.21 CPU) > Result: FAIL > > Compared to a good run, the other logs just end suddenly after the expected > "FATAL: password authentication failed". "Wstat: 13" means the Perl process > died to signal 13 (SIGPIPE). This test invokes psql in a way that fails > authentication, and it writes "SELECT 1" to psql's stdin. The SIGPIPE happens > if the psql process exits before that write. Nice investigation. An interesting coincidence is that I have looked yesterday at an off-list reported some folks have sent me which is basically what you have here. + # Return EPIPE instead of killing the process with SIGPIPE. An affected + # test may still fail, but it's more likely to report useful facts. + $SIG{PIPE} = 'IGNORE'; With this portion changed I can indeed see something which is more helpful: psql: FATAL: password authentication failed for user "saslpreptest1_role" ack Broken pipe: write( 11, 'SELECT 1' ) at /Library/Perl/5.18/IPC/Run/IO.pm line 558. If SIGPIPE is ignored then test output just stops after generating the FATAL message. Oops. > The two src/test/authentication tests then fail, but nothing else fails. > Let's ignore SIGPIPE in all TAP tests, which leaves some evidence in > regress_log_001_password: > > ack Broken pipe: write( 13, 'SELECT 1' ) at /home/nm/sw/cpan/lib/perl5/IPC/Run/IO.pm line 549. > > To fix the actual failures, we can cease sending "SELECT 1"; it's enough to > disconnect immediately. Patch attached. Perhaps you could use an empty string instead? I feel a bit uneasy about passing an undefined object to IPC::Run::run. -- Michael
On Mon, Dec 11, 2017 at 04:19:52PM +0900, Michael Paquier wrote: > On Sun, Dec 10, 2017 at 6:02 AM, Noah Misch <noah@leadboat.com> wrote: > If SIGPIPE is ignored then test output just stops after generating the > FATAL message. Oops. You mean "If SIGPIPE is not ignored ...", right? > > To fix the actual failures, we can cease sending "SELECT 1"; it's enough to > > disconnect immediately. Patch attached. > > Perhaps you could use an empty string instead? I feel a bit uneasy > about passing an undefined object to IPC::Run::run. IPC::Run documents the equivalence of undef and '' in this context; search for "close a child processes stdin" in http://search.cpan.org/~rbs/IPC-Run-0.78/lib/IPC/Run.pm. Thus, I expect both spellings to work reliably, and I find "undef" slightly more evocative.
On Mon, Dec 11, 2017 at 4:32 PM, Noah Misch <noah@leadboat.com> wrote: > On Mon, Dec 11, 2017 at 04:19:52PM +0900, Michael Paquier wrote: >> On Sun, Dec 10, 2017 at 6:02 AM, Noah Misch <noah@leadboat.com> wrote: > >> If SIGPIPE is ignored then test output just stops after generating the >> FATAL message. Oops. > > You mean "If SIGPIPE is not ignored ...", right? Yes, sorry. >> > To fix the actual failures, we can cease sending "SELECT 1"; it's enough to >> > disconnect immediately. Patch attached. >> >> Perhaps you could use an empty string instead? I feel a bit uneasy >> about passing an undefined object to IPC::Run::run. > > IPC::Run documents the equivalence of undef and '' in this context; search for > "close a child processes stdin" in > http://search.cpan.org/~rbs/IPC-Run-0.78/lib/IPC/Run.pm. Thus, I expect both > spellings to work reliably, and I find "undef" slightly more evocative. Thanks, I missed this bit. No objections to use undef then. -- Michael