Обсуждение: [HACKERS] Re: [COMMITTERS] pgsql: Fix an assertion failure related to anexclusive backup.

Поиск
Список
Период
Сортировка

[HACKERS] Re: [COMMITTERS] pgsql: Fix an assertion failure related to anexclusive backup.

От
Michael Paquier
Дата:
On Tue, Jan 17, 2017 at 5:40 PM, Fujii Masao <fujii@postgresql.org> wrote:
> Fix an assertion failure related to an exclusive backup.
>
> Previously multiple sessions could execute pg_start_backup() and
> pg_stop_backup() to start and stop an exclusive backup at the same time.
> This could trigger the assertion failure of
> "FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)".
> This happend because, even while pg_start_backup() was starting
> an exclusive backup, other session could run pg_stop_backup()
> concurrently and mark the backup as not-in-progress unconditionally.
>
> This patch introduces ExclusiveBackupState indicating the state of
> an exclusive backup. This state is used to ensure that there is only
> one session running pg_start_backup() or pg_stop_backup() at
> the same time, to avoid the assertion failure.

Please note that this commit message is not completely exact. This fix
does not only avoid triggerring this assertion failure, it also makes
sure that no manual on-disk intervention is needed by the user to
remove a backup_label file after a failure of pg_stop_backup(). Before
this patch, what happened is that the exclusive backup counter in
XLogCtl got decremented before removing backup_label. However, after
the counter was decremented, if an error occurred, the shared memory
counter would have been at 0 with a backup_label file on disk.
Subsequent attempts to start pg_start_backup() would have failed, and
putting the system backup into a consistent state would have required
an operator to remove by hand the backup_label file. The heart of the
logic here is in the callback of pg_stop_backup() when an error
happens during the deletion of the backup_label file.
-- 
Michael



Re: [HACKERS] Re: [COMMITTERS] pgsql: Fix an assertion failurerelated to an exclusive backup.

От
Fujii Masao
Дата:
On Tue, Jan 17, 2017 at 10:37 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Tue, Jan 17, 2017 at 5:40 PM, Fujii Masao <fujii@postgresql.org> wrote:
>> Fix an assertion failure related to an exclusive backup.
>>
>> Previously multiple sessions could execute pg_start_backup() and
>> pg_stop_backup() to start and stop an exclusive backup at the same time.
>> This could trigger the assertion failure of
>> "FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)".
>> This happend because, even while pg_start_backup() was starting
>> an exclusive backup, other session could run pg_stop_backup()
>> concurrently and mark the backup as not-in-progress unconditionally.
>>
>> This patch introduces ExclusiveBackupState indicating the state of
>> an exclusive backup. This state is used to ensure that there is only
>> one session running pg_start_backup() or pg_stop_backup() at
>> the same time, to avoid the assertion failure.
>
> Please note that this commit message is not completely exact. This fix
> does not only avoid triggerring this assertion failure, it also makes
> sure that no manual on-disk intervention is needed by the user to
> remove a backup_label file after a failure of pg_stop_backup(). Before
> this patch, what happened is that the exclusive backup counter in
> XLogCtl got decremented before removing backup_label. However, after
> the counter was decremented, if an error occurred, the shared memory
> counter would have been at 0 with a backup_label file on disk.
> Subsequent attempts to start pg_start_backup() would have failed, and
> putting the system backup into a consistent state would have required
> an operator to remove by hand the backup_label file. The heart of the
> logic here is in the callback of pg_stop_backup() when an error
> happens during the deletion of the backup_label file.

With the patch, what happens if pg_stop_backup exits with an error
after removing backup_label file before resetting the backup state
to none?

Regards,

-- 
Fujii Masao



Re: [HACKERS] Re: [COMMITTERS] pgsql: Fix an assertion failurerelated to an exclusive backup.

От
Michael Paquier
Дата:
On Tue, Jan 17, 2017 at 11:42 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> With the patch, what happens if pg_stop_backup exits with an error
> after removing backup_label file before resetting the backup state
> to none?

Removing the backup_label file is the last error that can happen
during the time the callback is set. And the counter is reset
immediately after.
-- 
Michael