Test Scenario: 1. INSERT data so tablesync should copy something 2. While paused in LogicalRepSyncTableStart do a DROP TABLE to rip the rug out from under the sync worker! 3. Continue the sync worker.... see what happens All code is from PG HEAD (3/Feb) except the "!!>>" added to allow me to attech to debugger. ============== ## ## Insert data ## [postgres@CentOS7-x64 ~]$ psql -d test_pub -c "INSERT INTO test_tab VALUES(1, 'foo');" INSERT 0 1 ## ## SUBSCRIBE and continue to breakpoint at top of tablesync function LogicalRepSyncTableStart ## [postgres@CentOS7-x64 ~]$ psql -d test_sub -p 54321 -c "CREATE SUBSCRIPTION tap_sub CONNECTION 'host=localhost dbname=test_pub application_name=tap_sub' PUBLICATION tap_pub;" 2021-02-03 13:42:20.031 AEDT [9555] LOG: logical decoding found consistent point at 0/16605E8 2021-02-03 13:42:20.031 AEDT [9555] DETAIL: There are no running transactions. 2021-02-03 13:42:20.031 AEDT [9555] STATEMENT: CREATE_REPLICATION_SLOT "tap_sub" LOGICAL pgoutput NOEXPORT_SNAPSHOT NOTICE: created replication slot "tap_sub" on publisher CREATE SUBSCRIPTION 2021-02-03 13:42:20.043 AEDT [9556] LOG: logical replication apply worker for subscription "tap_sub" has started 2021-02-03 13:42:20.043 AEDT [9556] LOG: !!>> The apply worker process has PID = 9556 [postgres@CentOS7-x64 ~]$ 2021-02-03 13:42:20.052 AEDT [9562] LOG: starting logical decoding for slot "tap_sub" 2021-02-03 13:42:20.052 AEDT [9562] DETAIL: Streaming transactions committing after 0/1660620, reading WAL from 0/16605E8. 2021-02-03 13:42:20.052 AEDT [9562] STATEMENT: START_REPLICATION SLOT "tap_sub" LOGICAL 0/0 (proto_version '2', publication_names '"tap_pub"') 2021-02-03 13:42:20.052 AEDT [9562] LOG: logical decoding found consistent point at 0/16605E8 2021-02-03 13:42:20.052 AEDT [9562] DETAIL: There are no running transactions. 2021-02-03 13:42:20.052 AEDT [9562] STATEMENT: START_REPLICATION SLOT "tap_sub" LOGICAL 0/0 (proto_version '2', publication_names '"tap_pub"') 2021-02-03 13:42:20.056 AEDT [9564] LOG: logical replication table synchronization worker for subscription "tap_sub", table "test_tab" has started 2021-02-03 13:42:20.056 AEDT [9564] LOG: !!>> The tablesync worker process has PID = 9564 2021-02-03 13:42:20.056 AEDT [9564] LOG: !!>> Sleeping 30 secs. For debugging, attach to process 9564 now! ## PID 9564 is sync worker ## PID 9556 is apply worker ## ## While paused in debugger for the tablesync worker do DROP TABLE on subscriber. ## Note we have not done any ALTER SUBSCRIPTION. ## psql -d test_sub -p 54321 -c "DROP TABLE test_tab;" DROP TABLE ## Following stacktrace LOG happens [postgres@CentOS7-x64 ~]$ TRAP: FailedAssertion("strvalue != NULL", File: "snprintf.c", Line: 442, PID: 9564) postgres: logical replication worker for subscription 16407 sync 16385 (ExceptionalCondition+0xb9)[0xad8f4b] postgres: logical replication worker for subscription 16407 sync 16385 [0xb4c6ee] postgres: logical replication worker for subscription 16407 sync 16385 (pg_vsnprintf+0x7c)[0xb4c072] postgres: logical replication worker for subscription 16407 sync 16385 (pvsnprintf+0x30)[0xb3f26f] postgres: logical replication worker for subscription 16407 sync 16385 (appendStringInfoVA+0x80)[0xb40cb6] postgres: logical replication worker for subscription 16407 sync 16385 (errmsg+0x183)[0xad9d33] postgres: logical replication worker for subscription 16407 sync 16385 [0x8c52f3] postgres: logical replication worker for subscription 16407 sync 16385 (LogicalRepSyncTableStart+0xb4)[0x8c6714] postgres: logical replication worker for subscription 16407 sync 16385 (ApplyWorkerMain+0x3f3)[0x8cbe30] postgres: logical replication worker for subscription 16407 sync 16385 (StartBackgroundWorker+0x21f)[0x87e957] postgres: logical replication worker for subscription 16407 sync 16385 [0x892fc7] postgres: logical replication worker for subscription 16407 sync 16385 [0x893375] postgres: logical replication worker for subscription 16407 sync 16385 [0x8923b0] /lib64/libpthread.so.0(+0xf5d0)[0x7f8cab2d15d0] /lib64/libc.so.6(__select+0x13)[0x7f8caa8dc0d3] postgres: logical replication worker for subscription 16407 sync 16385 [0x88e02e] postgres: logical replication worker for subscription 16407 sync 16385 (PostmasterMain+0x114f)[0x88da1d] postgres: logical replication worker for subscription 16407 sync 16385 [0x7911b7] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7f8caa809495] postgres: logical replication worker for subscription 16407 sync 16385 [0x480ec9] ## ## Because we are still attached to the tablesync worker we can see some more details of ## backtrace information which show it crashed tried to display a message using a bad relid. ## Program received signal SIGABRT, Aborted. 0x00007f8caa81d2c7 in raise () from /lib64/libc.so.6 (gdb) bt #0 0x00007f8caa81d2c7 in raise () from /lib64/libc.so.6 #1 0x00007f8caa81e9b8 in abort () from /lib64/libc.so.6 #2 0x0000000000ad8f78 in ExceptionalCondition (conditionName=0xd5a44b "strvalue != NULL", errorType=0xd5a43b "FailedAssertion", fileName=0xd5a430 "snprintf.c", lineNumber=442) at assert.c:69 #3 0x0000000000b4c6ee in dopr (target=0x7fff4409bdb0, format=0xc8f419 "\" has finished", args=0x7fff4409be88) at snprintf.c:442 #4 0x0000000000b4c072 in pg_vsnprintf ( str=0x1c9c690 "logical replication table synchronization worker for subscription \"tap_sub\", table \"", '\177' ..., count=1024, fmt=0xc8f3c8 "logical replication table synchronization worker for subscription \"%s\", table \"%s\" has finished", args=0x7fff4409be88) at snprintf.c:195 #5 0x0000000000b3f26f in pvsnprintf ( buf=0x1c9c690 "logical replication table synchronization worker for subscription \"tap_sub\", table \"", '\177' ..., len=1024, fmt=0xc8f3c8 "logical replication table synchronization worker for subscription \"%s\", table \"%s\" has finished", args=0x7fff4409be88) at psprintf.c:110 #6 0x0000000000b40cb6 in appendStringInfoVA (str=0x7fff4409bea0, fmt=0xc8f3c8 "logical replication table synchronization worker for subscription \"%s\", table \"%s\" has finished", args=0x7fff4409be88) at stringinfo.c:149 #7 0x0000000000ad9d33 in errmsg ( fmt=0xc8f3c8 "logical replication table synchronization worker for subscription \"%s\", table \"%s\" has finished") at elog.c:925 #8 0x00000000008c52f3 in finish_sync_worker () at tablesync.c:136 #9 0x00000000008c6714 in LogicalRepSyncTableStart (origin_startpos=0x7fff4409c048) at tablesync.c:848 #10 0x00000000008cbe30 in ApplyWorkerMain (main_arg=1) at worker.c:3040 #11 0x000000000087e957 in StartBackgroundWorker () at bgworker.c:879 #12 0x0000000000892fc7 in do_start_bgworker (rw=0x1cc1c00) at postmaster.c:5833 #13 0x0000000000893375 in maybe_start_bgworkers () at postmaster.c:6058 #14 0x00000000008923b0 in sigusr1_handler (postgres_signal_arg=10) at postmaster.c:5215 #15 #16 0x00007f8caa8dc0d3 in __select_nocancel () from /lib64/libc.so.6 ---Type to continue, or q to quit--- #17 0x000000000088e02e in ServerLoop () at postmaster.c:1694 #18 0x000000000088da1d in PostmasterMain (argc=3, argv=0x1c9a4c0) at postmaster.c:1402 #19 0x00000000007911b7 in main (argc=3, argv=0x1c9a4c0) at main.c:209 ( ## ## Since it crashed above, the remainder of this log is not really important... ## psql -d test_sub -p 54321 -c "DROP SUBSCRIPTION tap_sub;" 2021-02-03 13:43:47.323 AEDT [9556] FATAL: terminating logical replication worker due to administrator command 2021-02-03 13:43:47.323 AEDT [5230] LOG: background worker "logical replication worker" (PID 9556) exited with exit code 1 2021-02-03 13:43:57.857 AEDT [5230] LOG: background worker "logical replication worker" (PID 9564) was terminated by signal 6: Aborted 2021-02-03 13:43:57.857 AEDT [5230] LOG: terminating any other active server processes WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command. server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. connection to server was lost 2021-02-03 13:43:57.859 AEDT [5230] LOG: all server processes terminated; reinitializing [postgres@CentOS7-x64 ~]$ [postgres@CentOS7-x64 ~]$ 2021-02-03 13:43:57.892 AEDT [12298] LOG: database system was interrupted; last known up at 2021-02-03 13:39:47 AEDT 2021-02-03 13:43:58.587 AEDT [12298] LOG: database system was not properly shut down; automatic recovery in progress 2021-02-03 13:43:58.590 AEDT [12298] LOG: redo starts at 0/16574C8 2021-02-03 13:43:58.593 AEDT [12298] LOG: invalid record length at 0/1666D70: wanted 24, got 0 2021-02-03 13:43:58.593 AEDT [12298] LOG: redo done at 0/1666D38 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s 2021-02-03 13:43:58.630 AEDT [5230] LOG: database system is ready to accept connections 2021-02-03 13:43:58.650 AEDT [12338] LOG: logical replication apply worker for subscription "tap_sub" has started 2021-02-03 13:43:58.650 AEDT [12338] LOG: !!>> The apply worker process has PID = 12338 2021-02-03 13:43:58.658 AEDT [12341] LOG: starting logical decoding for slot "tap_sub" 2021-02-03 13:43:58.658 AEDT [12341] DETAIL: Streaming transactions committing after 0/1660620, reading WAL from 0/16605E8. 2021-02-03 13:43:58.658 AEDT [12341] STATEMENT: START_REPLICATION SLOT "tap_sub" LOGICAL 0/0 (proto_version '2', publication_names '"tap_pub"') 2021-02-03 13:43:58.658 AEDT [12341] LOG: logical decoding found consistent point at 0/16605E8 2021-02-03 13:43:58.658 AEDT [12341] DETAIL: There are no running transactions. 2021-02-03 13:43:58.658 AEDT [12341] STATEMENT: START_REPLICATION SLOT "tap_sub" LOGICAL 0/0 (proto_version '2', publication_names '"tap_pub"')