Backup command and functions can cause assertion failure and segmentation fault

Поиск
Список
Период
Сортировка
От Fujii Masao
Тема Backup command and functions can cause assertion failure and segmentation fault
Дата
Msg-id 3374718f-9fbf-a950-6d66-d973e027f44c@oss.nttdata.com
обсуждение исходный текст
Ответы Re: Backup command and functions can cause assertion failure and segmentation fault  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Re: Backup command and functions can cause assertion failure and segmentation fault  (Masahiko Sawada <sawada.mshk@gmail.com>)
Список pgsql-hackers
Hi,

I found that the assertion failure and the segmentation fault could
happen by running pg_backup_start(), pg_backup_stop() and BASE_BACKUP
replication command, in v15 or before.

Here is the procedure to reproduce the assertion failure.

1. Connect to the server as the REPLICATION user who is granted
    EXECUTE to run pg_backup_start() and pg_backup_stop().

     $ psql
     =# CREATE ROLE foo REPLICATION LOGIN;
     =# GRANT EXECUTE ON FUNCTION pg_backup_start TO foo;
     =# GRANT EXECUTE ON FUNCTION pg_backup_stop TO foo;
     =# \q

     $ psql "replication=database user=foo dbname=postgres"

2. Run pg_backup_start() and pg_backup_stop().

     => SELECT pg_backup_start('test', true);
     => SELECT pg_backup_stop();

3. Run BASE_BACKUP replication command with smaller MAX_RATE so that
    it can take a long time to finish.

     => BASE_BACKUP (CHECKPOINT 'fast', MAX_RATE 32);

4. Terminate the replication connection while it's running BASE_BACKUP.

     $ psql
     =# SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE backend_type = 'walsender';

This procedure can cause the following assertion failure.

TRAP: FailedAssertion("XLogCtl->Insert.runningBackups > 0", File: "xlog.c", Line: 8779, PID: 69434)
0   postgres                            0x000000010ab2ff7f ExceptionalCondition + 223
1   postgres                            0x000000010a455126 do_pg_abort_backup + 102
2   postgres                            0x000000010a8e13aa shmem_exit + 218
3   postgres                            0x000000010a8e11ed proc_exit_prepare + 125
4   postgres                            0x000000010a8e10f3 proc_exit + 19
5   postgres                            0x000000010ab3171c errfinish + 1100
6   postgres                            0x000000010a91fa80 ProcessInterrupts + 1376
7   postgres                            0x000000010a886907 throttle + 359
8   postgres                            0x000000010a88675d bbsink_throttle_archive_contents + 29
9   postgres                            0x000000010a885aca bbsink_archive_contents + 154
10  postgres                            0x000000010a885a2a bbsink_forward_archive_contents + 218
11  postgres                            0x000000010a884a99 bbsink_progress_archive_contents + 89
12  postgres                            0x000000010a881aba bbsink_archive_contents + 154
13  postgres                            0x000000010a881598 sendFile + 1816
14  postgres                            0x000000010a8806c5 sendDir + 3573
15  postgres                            0x000000010a8805d9 sendDir + 3337
16  postgres                            0x000000010a87e262 perform_base_backup + 1250
17  postgres                            0x000000010a87c734 SendBaseBackup + 500
18  postgres                            0x000000010a89a7f8 exec_replication_command + 1144
19  postgres                            0x000000010a92319a PostgresMain + 2154
20  postgres                            0x000000010a82b702 BackendRun + 50
21  postgres                            0x000000010a82acfc BackendStartup + 524
22  postgres                            0x000000010a829b2c ServerLoop + 716
23  postgres                            0x000000010a827416 PostmasterMain + 6470
24  postgres                            0x000000010a703e19 main + 809
25  libdyld.dylib                       0x00007fff2072ff3d start + 1


Here is the procedure to reproduce the segmentation fault.

1. Connect to the server as the REPLICATION user who is granted
    EXECUTE to run pg_backup_stop().

     $ psql
     =# CREATE ROLE foo REPLICATION LOGIN;
     =# GRANT EXECUTE ON FUNCTION pg_backup_stop TO foo;
     =# \q

     $ psql "replication=database user=foo dbname=postgres"

2. Run BASE_BACKUP replication command with smaller MAX_RATE so that
    it can take a long time to finish.

     => BASE_BACKUP (CHECKPOINT 'fast', MAX_RATE 32);

3. Press Ctrl-C to cancel BASE_BACKUP while it's running.

4. Run pg_backup_stop().

     => SELECT pg_backup_stop();

This procedure can cause the following segmentation fault.

     LOG:  server process (PID 69449) was terminated by signal 11: Segmentation fault: 11
     DETAIL:  Failed process was running: SELECT pg_backup_stop();


The root cause of these failures seems that sessionBackupState flag
is not reset to SESSION_BACKUP_NONE even when BASE_BACKUP is aborted.
So attached patch changes do_pg_abort_backup callback so that
it resets sessionBackupState. I confirmed that, with the patch,
those assertion failure and segmentation fault didn't happen.

But this change has one issue that; if BASE_BACKUP is run while
a backup is already in progress in the session by pg_backup_start()
and that session is terminated, the change causes XLogCtl->Insert.runningBackups
to be decremented incorrectly. That is, XLogCtl->Insert.runningBackups
is incremented by two by pg_backup_start() and BASE_BACKUP,
but it's decremented only by one by the termination of the session.

To address this issue, I think that we should disallow BASE_BACKUP
to run while a backup is already in progress in the *same* session
as we already do this for pg_backup_start(). Thought? I included
the code to disallow that in the attached patch.

Regards,


-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Thomas Munro
Дата:
Сообщение: Re: making relfilenodes 56 bits
Следующее
От: "shiy.fnst@fujitsu.com"
Дата:
Сообщение: RE: Handle infinite recursion in logical replication setup