[MASSMAIL] Resetting synchronous_standby_names can wait for CHECKPOINT to finish

Поиск
Список
Период
Сортировка
От Yusuke Egashira (Fujitsu)
Тема [MASSMAIL] Resetting synchronous_standby_names can wait for CHECKPOINT to finish
Дата
Msg-id TY3PR01MB996612E799EACC4FE1C9C4DCFF092@TY3PR01MB9966.jpnprd01.prod.outlook.com
обсуждение исходный текст
Ответы RE: Resetting synchronous_standby_names can wait for CHECKPOINT to finish  ("Yusuke Egashira (Fujitsu)" <egashira.yusuke@fujitsu.com>)
Список pgsql-hackers
Hello, hackers.

When the checkpointer process is busy, even if we reset synchronous_standby_names, the resumption of the backend
processeswaiting in SyncRep are made to wait until the checkpoint is completed. 
This prevents the prompt resumption of application processing when a problem occurs on the standby server in a
synchronousreplication system. 
I confirmed this in PostgreSQL 12.18.

This issue has actually become a major problem for our customer.
When a problem occurred in the replication network, even after resetting synchronous_standby_names, the backend
processesdid not respond, resulting in timeout errors in many client applications.  
The customer has also set the checkpoint_completion_target parameter to 0.9, and it seems to have been working fine
undernormal conditions. 
However, there was a time when VACUUM was concentrated on a huge table. At that time, more than five times the
max_wal_sizeof WAL output occurred during checkpoint processing.  
Unfortunately, communication with the synchronous standby was lost during that checkpoint processing, and despite
resettingthe synchronous_standby_names, multiple client applications could not return a response while waiting for
SyncRep.


I wrote a script(reset-synchronous_standby_names-during-checkpoint.sh) to illustrate the issue.
The script stops the synchronous standby during a transaction, and then resets synchronous_standby_names during
checkpoint.
When I run this on my 1-core RHEL7 machine, I see that COMMIT does wait until the CHECKPOINT finishes, even though
synchronous_standby_nameshas been reset. 

I am attaching a patch (REL_12_STABLE) for the simplest seeming solution.
This moves the handling of SIGHUP reception by the checkpointer outside of the sleep process.
However, I am concerned that this change could affect the performance of checkpoint execution when there is a delay in
thecheckpoint schedule. 
Can PostgreSQL tolerate this overhead?

Regards,
Yusuke Egashira.


Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: ALTER TABLE SET ACCESS METHOD on partitioned tables
Следующее
От: "David G. Johnston"
Дата:
Сообщение: Re: Stability of queryid in minor versions