Обсуждение: BUG #17054: Memory corruption in logical replication worker when replicating into partitioned table
BUG #17054: Memory corruption in logical replication worker when replicating into partitioned table
От
PG Bug reporting form
Дата:
The following bug has been logged on the website: Bug reference: 17054 Logged by: Sergey Bernikov Email address: sbernikov@gmail.com PostgreSQL version: 13.3 Operating system: Ubuntu 18.04.4 Description: When logical replication target is a partitioned table then execution of any DDL on source table leads to crash of target (subscriber) server. Steps to reproduce: 1. in source DB: create table and add to publication create table test_replication ( id int not null, value varchar(100), primary key (id) ); create publication test_publication for table test_replication; 2. in target DB: create partitioned table and start replication create table test_replication ( id int not null, value varchar(100), primary key (id) ) partition by range (id); create table test_replication_p_1 partition of test_replication for values from (0) to (10); create table test_replication_p_2 partition of test_replication for values from (10) to (20); create subscription test_subscription CONNECTION '...' publication test_publication; 4. in source DB: insert and update data insert into test_replication(id, value) values (1, 'a1'); insert into test_replication(id, value) values (2, 'a1'); insert into test_replication(id, value) values (3, 'a1'); update test_replication set value = 'a2'; 5. in source DB: execute any DDL on the table vacuum test_replication; 6. in source DB: update data update test_replication set value = 'a3'; Result: logical replication worker on target server crashes with error message: LOG: background worker "logical replication worker" (PID 28356) was terminated by signal 11: Segmentation fault LOG: terminating any other active server processes Backtrace from core dump: Core was generated by `postgres: 13/main: logical replication worker for subscription 781420 '. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x0000557026391fef in slot_modify_cstrings (slot=slot@entry=0x557026fa8298, srcslot=<optimized out>, rel=rel@entry=0x557026ff7370, values=values@entry=0x7ffff4135550, replaces=replaces@entry=0x7ffff4138950) at ./build/../src/backend/replication/logical/worker.c:434 434 ./build/../src/backend/replication/logical/worker.c: No such file or directory. (gdb) bt #0 0x0000557026391fef in slot_modify_cstrings (slot=slot@entry=0x557026fa8298, srcslot=<optimized out>, rel=rel@entry=0x557026ff7370, values=values@entry=0x7ffff4135550, replaces=replaces@entry=0x7ffff4138950) at ./build/../src/backend/replication/logical/worker.c:434 #1 0x0000557026392b9f in apply_handle_tuple_routing (relinfo=0x557026f80928, estate=estate@entry=0x557026fae108, remoteslot=remoteslot@entry=0x557026f813d8, newtup=newtup@entry=0x7ffff4135550, relmapentry=relmapentry@entry=0x557026f96d90, operation=operation@entry=CMD_UPDATE) at ./build/../src/backend/replication/logical/worker.c:1105 #2 0x00005570263934df in apply_handle_update (s=s@entry=0x7ffff41390a0) at ./build/../src/backend/replication/logical/worker.c:791 #3 0x00005570263941c1 in apply_dispatch (s=0x7ffff41390a0) at ./build/../src/backend/replication/logical/worker.c:1368 #4 LogicalRepApplyLoop (last_received=936525246824) at ./build/../src/backend/replication/logical/worker.c:1577 #5 ApplyWorkerMain (main_arg=<optimized out>) at ./build/../src/backend/replication/logical/worker.c:2123 #6 0x00005570263613ae in StartBackgroundWorker () at ./build/../src/backend/postmaster/bgworker.c:879 #7 0x000055702636d5a3 in do_start_bgworker (rw=0x557026ec9110) at ./build/../src/backend/postmaster/postmaster.c:5870 #8 maybe_start_bgworkers () at ./build/../src/backend/postmaster/postmaster.c:6095 #9 0x000055702636e035 in sigusr1_handler (postgres_signal_arg=<optimized out>) at ./build/../src/backend/postmaster/postmaster.c:5255 #10 <signal handler called> #11 0x00007f4bb7bbcdd7 in __GI___select (nfds=nfds@entry=10, readfds=readfds@entry=0x7ffff4139870, writefds=writefds@entry=0x0, exceptfds=exceptfds@entry=0x0, timeout=timeout@entry=0x7ffff41397d0) at ../sysdeps/unix/sysv/linux/select.c:41 #12 0x000055702636e5f9 in ServerLoop () at ./build/../src/backend/postmaster/postmaster.c:1703 #13 0x0000557026370423 in PostmasterMain (argc=5, argv=<optimized out>) at ./build/../src/backend/postmaster/postmaster.c:1412 #14 0x00005570260c19f8 in main (argc=5, argv=0x557026e73fd0) at ./build/../src/backend/main/main.c:210
PG Bug reporting form <noreply@postgresql.org> writes: > When logical replication target is a partitioned table then execution of any > DDL on source table leads to crash of target (subscriber) server. Thanks for the report! I duplicated the crash on v13 branch tip, although it's hitting an assertion failure before reaching any segfault: #2 0x00000000008f466a in ExceptionalCondition ( conditionName=conditionName@entry=0xa5eae0 "natts == rel->attrmap->maplen", errorType=errorType@entry=0x948cc9 "FailedAssertion", fileName=fileName@entry=0xa52956 "worker.c", lineNumber=lineNumber@entry=490) at assert.c:67 #3 0x0000000000777741 in slot_modify_cstrings (slot=slot@entry=0x2ec6e40, srcslot=<optimized out>, rel=rel@entry=0x2eca918, values=values@entry=0x7fffb3506480, replaces=replaces@entry=0x7fffb3509880) at worker.c:490 #4 0x00000000007785e7 in apply_handle_tuple_routing ( edata=edata@entry=0x2ea45a0, remoteslot=remoteslot@entry=0x2ea48a0, newtup=newtup@entry=0x7fffb3506480, operation=operation@entry=CMD_UPDATE) at worker.c:1153 #5 0x0000000000778e74 in apply_handle_update (s=s@entry=0x7fffb3509fa0) at worker.c:846 #6 0x000000000077963c in apply_dispatch (s=0x7fffb3509fa0) at worker.c:1415 #7 LogicalRepApplyLoop (last_received=254887792) at worker.c:1624 #8 ApplyWorkerMain (main_arg=<optimized out>) at worker.c:2171 #9 0x0000000000743ec9 in StartBackgroundWorker () at bgworker.c:890 Interestingly, the same test case does NOT crash for me on master. So apparently we fixed something that should have been back-patched. regards, tom lane
I wrote: > PG Bug reporting form <noreply@postgresql.org> writes: >> When logical replication target is a partitioned table then execution of any >> DDL on source table leads to crash of target (subscriber) server. > Thanks for the report! I duplicated the crash on v13 branch tip, I can't reproduce this anymore after commit b270713fd. I think it's probably the same thing I found while making a test for your other report: logicalrep_partition_open() failed to ensure that the LogicalRepPartMapEntry it built for a partition was fully independent of that for the partition root, leading to trouble if the root entry was later freed or rebuilt. My failure to see a crash on HEAD was probably an accidental issue of memory reuse patterns. regards, tom lane