Standby catch up state change

Поиск
Список
Период
Сортировка
От Pavan Deolasee
Тема Standby catch up state change
Дата
Msg-id CABOikdMqc7qdkFqKvNg4HTYb-QjnR3VwY-PdbPq=+q6chRbt4w@mail.gmail.com
обсуждение исходный текст
Ответы Re: Standby catch up state change  (Andres Freund <andres@2ndquadrant.com>)
Список pgsql-hackers
Hello,

I wonder if there is an issue with the way state change happens from WALSNDSTATE_CATCHUP to WALSNDSTATE_STREAMING. Please note my question is solely based on a strange behavior reported by a colleague and my limited own code reading. The colleague is trying out replication with a networking middleware and noticed that the master logs the debug message about standby catching up, but the write_location in the pg_stat_replication view takes minutes to reflect the actual catch up location.

ISTM that the following code in walsender.c assumes that the standby has caught up once master sends all the required WAL.

1548     /* Do we have any work to do? */
1549     Assert(sentPtr <= SendRqstPtr);
1550     if (SendRqstPtr <= sentPtr)
1551     {
1552         *caughtup = true;
1553         return;
1554     }

But what if the standby has not yet received all the WAL data sent by the master ? It can happen for various reasons such as caching at the OS level or the network layer on the sender machine or any other intermediate hops.

Should we not instead wait for the standby to have received all the WAL before declaring that it has caught up ? If a failure happens while the data is still in the sender's buffer, the standby may not actually catch up to the desired point contrary to the LOG message displayed on the master.

Thanks,
Pavan

--
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Haribabu kommi
Дата:
Сообщение: Re: Heavily modified big table bloat even in auto vacuum is running
Следующее
От: Andres Freund
Дата:
Сообщение: Re: Standby catch up state change