Обсуждение: [HACKERS] PostgresNode::poll_query_until hacking

Поиск
Список
Период
Сортировка

[HACKERS] PostgresNode::poll_query_until hacking

От
Tom Lane
Дата:
The attached proposed patch changes the TAP test infrastructure's
poll_query_until function in two ways:

1. An optional argument is added to allow specifying the query result
value we're waiting for, overriding the normal "t".  This allows
folding a handwritten delay loop in 007_sync_rep.pl into the
poll_query_until ecosystem.  As far as I've found, there are no other
handwritten delay loops in the TAP tests.

2. poll_query_until is modified to probe 10X per second not once
per second, in keeping with the changes I've been making elsewhere
to remove not-carefully-analyzed 1s delays in the regression tests.

On my workstation, the reduced probe delay shaves a useful amount
of time off the recovery and subscription regression tests.  I also
tried it on dromedary, which is about the slowest hardware I'd care
to run the TAP tests on regularly, and it seems to be about a wash
there --- some tests get faster, but some get slower, presumably due
to more overhead from the probe queries.

I notice that buildfarm member skink (which runs with valgrind)
recently failed a test run due to poll_query_until timing out:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2017-06-30%2000%3A50%3A01

I'm inclined to respond to that either by increasing the fixed
180-second timeout, or by making it configurable from an environment
variable (which Andres would then have to add to skink's configuration).

Thoughts?

            regards, tom lane

diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index f4fa600..4346423 100644
*** a/src/test/perl/PostgresNode.pm
--- b/src/test/perl/PostgresNode.pm
*************** sub psql
*** 1213,1248 ****

  =pod

! =item $node->poll_query_until(dbname, query)

! Run a query once a second, until it returns 't' (i.e. SQL boolean true).
! Continues polling if psql returns an error result. Times out after 180 seconds.

  =cut

  sub poll_query_until
  {
!     my ($self, $dbname, $query) = @_;

!     my $max_attempts = 180;
!     my $attempts     = 0;
      my ($stdout, $stderr);

      while ($attempts < $max_attempts)
      {
-         my $cmd =
-           [ 'psql', '-XAt', '-c', $query, '-d', $self->connstr($dbname) ];
          my $result = IPC::Run::run $cmd, '>', \$stdout, '2>', \$stderr;

          chomp($stdout);
          $stdout =~ s/\r//g if $TestLib::windows_os;
!         if ($stdout eq "t")
          {
              return 1;
          }

!         # Wait a second before retrying.
!         sleep 1;
          $attempts++;
      }

--- 1213,1255 ----

  =pod

! =item $node->poll_query_until($dbname, $query [, $expected ])

! Run B<$query> repeatedly, until it returns the B<$expected> result
! ('t', or SQL boolean true, by default).
! Continues polling if B<psql> returns an error result.
! Times out after 180 seconds.
! Returns 1 if successful, 0 if timed out.

  =cut

  sub poll_query_until
  {
!     my ($self, $dbname, $query, $expected) = @_;

!     $expected = 't' unless defined($expected);    # default value
!
!     my $cmd =
!         [ 'psql', '-XAt', '-c', $query, '-d', $self->connstr($dbname) ];
      my ($stdout, $stderr);
+     my $max_attempts = 180 * 10;
+     my $attempts     = 0;

      while ($attempts < $max_attempts)
      {
          my $result = IPC::Run::run $cmd, '>', \$stdout, '2>', \$stderr;

          chomp($stdout);
          $stdout =~ s/\r//g if $TestLib::windows_os;
!
!         if ($stdout eq $expected)
          {
              return 1;
          }

!         # Wait 0.1 second before retrying.
!         select undef, undef, undef, 0.1;
!
          $attempts++;
      }

diff --git a/src/test/recovery/t/007_sync_rep.pl b/src/test/recovery/t/007_sync_rep.pl
index 8e3cc5e..0f999f0 100644
*** a/src/test/recovery/t/007_sync_rep.pl
--- b/src/test/recovery/t/007_sync_rep.pl
*************** use Test::More tests => 11;
*** 9,15 ****
  my $check_sql =
  "SELECT application_name, sync_priority, sync_state FROM pg_stat_replication ORDER BY application_name;";

! # Check that sync_state of each standby is expected.
  # If $setting is given, synchronous_standby_names is set to it and
  # the configuration file is reloaded before the test.
  sub test_sync_state
--- 9,15 ----
  my $check_sql =
  "SELECT application_name, sync_priority, sync_state FROM pg_stat_replication ORDER BY application_name;";

! # Check that sync_state of each standby is expected (waiting till it is).
  # If $setting is given, synchronous_standby_names is set to it and
  # the configuration file is reloaded before the test.
  sub test_sync_state
*************** sub test_sync_state
*** 23,46 ****
          $self->reload;
      }

!     my $timeout_max = 30;
!     my $timeout     = 0;
!     my $result;
!
!     # A reload may take some time to take effect on busy machines,
!     # hence use a loop with a timeout to give some room for the test
!     # to pass.
!     while ($timeout < $timeout_max)
!     {
!         $result = $self->safe_psql('postgres', $check_sql);
!
!         last if ($result eq $expected);
!
!         $timeout++;
!         sleep 1;
!     }
!
!     is($result, $expected, $msg);
  }

  # Initialize master node
--- 23,29 ----
          $self->reload;
      }

!     ok( $self->poll_query_until('postgres', $check_sql, $expected), $msg);
  }

  # Initialize master node

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgresNode::poll_query_until hacking

От
Michael Paquier
Дата:
On Sun, Jul 2, 2017 at 4:53 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> The attached proposed patch changes the TAP test infrastructure's
> poll_query_until function in two ways:
>
> 1. An optional argument is added to allow specifying the query result
> value we're waiting for, overriding the normal "t".  This allows
> folding a handwritten delay loop in 007_sync_rep.pl into the
> poll_query_until ecosystem.

True that in this test the expected output can be quite complicated,
so turning that into a query that returned 't' would make the code
unreadable.

> As far as I've found, there are no other
> handwritten delay loops in the TAP tests.

Good catch. Yes here using a poll_query_until call makes sense.

> 2. poll_query_until is modified to probe 10X per second not once
> per second, in keeping with the changes I've been making elsewhere
> to remove not-carefully-analyzed 1s delays in the regression tests.
>
> On my workstation, the reduced probe delay shaves a useful amount
> of time off the recovery and subscription regression tests.  I also
> tried it on dromedary, which is about the slowest hardware I'd care
> to run the TAP tests on regularly, and it seems to be about a wash
> there --- some tests get faster, but some get slower, presumably due
> to more overhead from the probe queries.

Check.

> Thoughts?

-   is($result, $expected, $msg);
+   ok( $self->poll_query_until('postgres', $check_sql, $expected), $msg);
A matter of taste here: using a space after "ok(".

If you would like to get some feedback from me, waiting until Monday
morning my time (Sunday evening yours) before pushing patches would be
better.
-- 
Michael



Re: [HACKERS] PostgresNode::poll_query_until hacking

От
Tom Lane
Дата:
Michael Paquier <michael.paquier@gmail.com> writes:
> If you would like to get some feedback from me, waiting until Monday
> morning my time (Sunday evening yours) before pushing patches would be
> better.

If you have ideas for improvement, it's always possible to amend the
code later.  I've been pushing this stuff to see what happens in the
buildfarm, not to foreclose discussion.
        regards, tom lane