RE: Add two missing tests in 035_standby_logical_decoding.pl

Поиск
Список
Период
Сортировка
От Yu Shi (Fujitsu)
Тема RE: Add two missing tests in 035_standby_logical_decoding.pl
Дата
Msg-id OSZPR01MB6310F24417BDD04731BD7978FD659@OSZPR01MB6310.jpnprd01.prod.outlook.com
обсуждение исходный текст
Ответ на Re: Add two missing tests in 035_standby_logical_decoding.pl  ("Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com>)
Ответы Re: Add two missing tests in 035_standby_logical_decoding.pl  ("Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com>)
Список pgsql-hackers
On Mon, Apr 24, 2023 8:07 PM Drouvot, Bertrand <bertranddrouvot.pg@gmail.com> wrote:
> 
> On 4/24/23 11:45 AM, Amit Kapila wrote:
> > On Mon, Apr 24, 2023 at 11:54 AM Amit Kapila <amit.kapila16@gmail.com>
> wrote:
> >>
> >> On Mon, Apr 24, 2023 at 11:24 AM Drouvot, Bertrand
> >> <bertranddrouvot.pg@gmail.com> wrote:
> >>>
> >>
> >> Few comments:
> >> ============
> >>
> >
> > +# We can not test if the WAL file still exists immediately.
> > +# We need to let some time to the standby to actually "remove" it.
> > +my $i = 0;
> > +while (1)
> > +{
> > + last if !-f $standby_walfile;
> > + if ($i++ == 10 * $default_timeout)
> > + {
> > + die
> > +   "could not determine if WAL file has been retained or not, can't continue";
> > + }
> > + usleep(100_000);
> > +}
> >
> > Is this adhoc wait required because we can't guarantee that the
> > checkpoint is complete on standby even after using wait_for_catchup?
> 
> Yes, the restart point on the standby is not necessary completed even after
> wait_for_catchup is done.
> 

I think that's because when replaying a checkpoint record, the startup process
of standby only saves the information of the checkpoint, and we need to wait for
the checkpointer to perform a restartpoint (see RecoveryRestartPoint), right? If
so, could we force a checkpoint on standby? After this, the standby should have
completed the restartpoint and we don't need to wait.

Besides, would it be better to wait for the cascading standby? If the wal log
file needed for cascading standby is removed on the standby, the subsequent test
will fail. Do we need to consider this scenario? I saw the following error
message after setting recovery_min_apply_delay to 5s on the cascading standby,
and the test failed due to a timeout while waiting for cascading standby.

Log of cascading standby node:
FATAL:  could not receive data from WAL stream: ERROR:  requested WAL segment 000000010000000000000003 has already been
removed

Regards,
Shi Yu

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Kyotaro Horiguchi
Дата:
Сообщение: Re: pg_stat_io for the startup process
Следующее
От: Andreas 'ads' Scherbaum
Дата:
Сообщение: Find dangling membership roles in pg_dumpall