Re: Strange decreasing value of pg_last_wal_receive_lsn()

Поиск
Список
Период
Сортировка
От godjan •
Тема Re: Strange decreasing value of pg_last_wal_receive_lsn()
Дата
Msg-id D3A6D0DE-A8C7-4E3A-A1B6-406C53662928@gmail.com
обсуждение исходный текст
Ответ на Re: Strange decreasing value of pg_last_wal_receive_lsn()  (Jehan-Guillaume de Rorthais <jgdr@dalibo.com>)
Ответы Re: Strange decreasing value of pg_last_wal_receive_lsn()  (Jehan-Guillaume de Rorthais <jgdr@dalibo.com>)
Список pgsql-hackers
-> Why do you kill -9 your standby?
Hi, it’s Jepsen test for our HA solution. It checks that we don’t lose data in such situation.

So, now we update logic as Michael said. All ha alive standbys now waiting for replaying all WAL that they have and
afterwe use pg_last_replay_lsn() to choose which standby will be promoted in failover. 

It fixed out trouble, but there is one another. Now we should wait when all ha alive hosts finish replaying WAL to
failover.It might take a while(for example WAL contains wal_record about splitting b-tree). 

We are looking for options that will allow us to find a standby that contains all data and replay all WAL only for this
standbybefore failover. 

Maybe you have ideas on how to keep the last actual value of pg_last_wal_receive_lsn()? As I understand WAL receiver
doesn’twrite to disk walrcv->flushedUpto. 

> On 13 May 2020, at 19:52, Jehan-Guillaume de Rorthais <jgdr@dalibo.com> wrote:
>
>
> (too bad the history has been removed to keep context)
>
> On Fri, 8 May 2020 15:02:26 +0500
> godjan • <g0dj4n@gmail.com> wrote:
>
>> I got it, thank you.
>> Can you recommend what to use to determine which quorum standby should be
>> promoted in such case? We planned to use pg_last_wal_receive_lsn() to
>> determine which has fresh data but if it returns the beginning of the segment
>> on both replicas we can’t determine which standby confirmed that write
>> transaction to disk.
>
> Wait, pg_last_wal_receive_lsn() only decrease because you killed your standby.
>
> pg_last_wal_receive_lsn() returns the value of walrcv->flushedUpto. The later
> is set to the beginning of the segment requested only during the first
> walreceiver startup or a timeline fork:
>
>     /*
>      * If this is the first startup of walreceiver (on this timeline),
>      * initialize flushedUpto and latestChunkStart to the starting point.
>      */
>     if (walrcv->receiveStart == 0 || walrcv->receivedTLI != tli)
>     {
>         walrcv->flushedUpto = recptr;
>         walrcv->receivedTLI = tli;
>         walrcv->latestChunkStart = recptr;
>     }
>     walrcv->receiveStart = recptr;
>     walrcv->receiveStartTLI = tli;
>
> After a primary loss, as far as the standby are up and running, it is fine
> to use pg_last_wal_receive_lsn().
>
> Why do you kill -9 your standby? Whay am I missing? Could you explain the
> usecase you are working on to justify this?
>
> Regards,




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Our naming of wait events is a disaster.
Следующее
От: Fujii Masao
Дата:
Сообщение: Re: SLRU statistics