Backup command and functions can cause assertion failure and segmentation fault
От | Fujii Masao |
---|---|
Тема | Backup command and functions can cause assertion failure and segmentation fault |
Дата | |
Msg-id | 3374718f-9fbf-a950-6d66-d973e027f44c@oss.nttdata.com обсуждение исходный текст |
Ответы |
Re: Backup command and functions can cause assertion failure and segmentation fault
(Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Re: Backup command and functions can cause assertion failure and segmentation fault (Masahiko Sawada <sawada.mshk@gmail.com>) |
Список | pgsql-hackers |
Hi, I found that the assertion failure and the segmentation fault could happen by running pg_backup_start(), pg_backup_stop() and BASE_BACKUP replication command, in v15 or before. Here is the procedure to reproduce the assertion failure. 1. Connect to the server as the REPLICATION user who is granted EXECUTE to run pg_backup_start() and pg_backup_stop(). $ psql =# CREATE ROLE foo REPLICATION LOGIN; =# GRANT EXECUTE ON FUNCTION pg_backup_start TO foo; =# GRANT EXECUTE ON FUNCTION pg_backup_stop TO foo; =# \q $ psql "replication=database user=foo dbname=postgres" 2. Run pg_backup_start() and pg_backup_stop(). => SELECT pg_backup_start('test', true); => SELECT pg_backup_stop(); 3. Run BASE_BACKUP replication command with smaller MAX_RATE so that it can take a long time to finish. => BASE_BACKUP (CHECKPOINT 'fast', MAX_RATE 32); 4. Terminate the replication connection while it's running BASE_BACKUP. $ psql =# SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE backend_type = 'walsender'; This procedure can cause the following assertion failure. TRAP: FailedAssertion("XLogCtl->Insert.runningBackups > 0", File: "xlog.c", Line: 8779, PID: 69434) 0 postgres 0x000000010ab2ff7f ExceptionalCondition + 223 1 postgres 0x000000010a455126 do_pg_abort_backup + 102 2 postgres 0x000000010a8e13aa shmem_exit + 218 3 postgres 0x000000010a8e11ed proc_exit_prepare + 125 4 postgres 0x000000010a8e10f3 proc_exit + 19 5 postgres 0x000000010ab3171c errfinish + 1100 6 postgres 0x000000010a91fa80 ProcessInterrupts + 1376 7 postgres 0x000000010a886907 throttle + 359 8 postgres 0x000000010a88675d bbsink_throttle_archive_contents + 29 9 postgres 0x000000010a885aca bbsink_archive_contents + 154 10 postgres 0x000000010a885a2a bbsink_forward_archive_contents + 218 11 postgres 0x000000010a884a99 bbsink_progress_archive_contents + 89 12 postgres 0x000000010a881aba bbsink_archive_contents + 154 13 postgres 0x000000010a881598 sendFile + 1816 14 postgres 0x000000010a8806c5 sendDir + 3573 15 postgres 0x000000010a8805d9 sendDir + 3337 16 postgres 0x000000010a87e262 perform_base_backup + 1250 17 postgres 0x000000010a87c734 SendBaseBackup + 500 18 postgres 0x000000010a89a7f8 exec_replication_command + 1144 19 postgres 0x000000010a92319a PostgresMain + 2154 20 postgres 0x000000010a82b702 BackendRun + 50 21 postgres 0x000000010a82acfc BackendStartup + 524 22 postgres 0x000000010a829b2c ServerLoop + 716 23 postgres 0x000000010a827416 PostmasterMain + 6470 24 postgres 0x000000010a703e19 main + 809 25 libdyld.dylib 0x00007fff2072ff3d start + 1 Here is the procedure to reproduce the segmentation fault. 1. Connect to the server as the REPLICATION user who is granted EXECUTE to run pg_backup_stop(). $ psql =# CREATE ROLE foo REPLICATION LOGIN; =# GRANT EXECUTE ON FUNCTION pg_backup_stop TO foo; =# \q $ psql "replication=database user=foo dbname=postgres" 2. Run BASE_BACKUP replication command with smaller MAX_RATE so that it can take a long time to finish. => BASE_BACKUP (CHECKPOINT 'fast', MAX_RATE 32); 3. Press Ctrl-C to cancel BASE_BACKUP while it's running. 4. Run pg_backup_stop(). => SELECT pg_backup_stop(); This procedure can cause the following segmentation fault. LOG: server process (PID 69449) was terminated by signal 11: Segmentation fault: 11 DETAIL: Failed process was running: SELECT pg_backup_stop(); The root cause of these failures seems that sessionBackupState flag is not reset to SESSION_BACKUP_NONE even when BASE_BACKUP is aborted. So attached patch changes do_pg_abort_backup callback so that it resets sessionBackupState. I confirmed that, with the patch, those assertion failure and segmentation fault didn't happen. But this change has one issue that; if BASE_BACKUP is run while a backup is already in progress in the session by pg_backup_start() and that session is terminated, the change causes XLogCtl->Insert.runningBackups to be decremented incorrectly. That is, XLogCtl->Insert.runningBackups is incremented by two by pg_backup_start() and BASE_BACKUP, but it's decremented only by one by the termination of the session. To address this issue, I think that we should disallow BASE_BACKUP to run while a backup is already in progress in the *same* session as we already do this for pg_backup_start(). Thought? I included the code to disallow that in the attached patch. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Вложения
В списке pgsql-hackers по дате отправления:
Следующее
От: "shiy.fnst@fujitsu.com"Дата:
Сообщение: RE: Handle infinite recursion in logical replication setup