Hi,
Good catch.
The problem is here:
On 2023-01-13 20:53:49 +0530, Lakshmi Narayanan Sreethar wrote:
> #7 0x0000559cccbe1e71 in LogicalRepSyncTableStart
> (origin_startpos=0x7fffb26f7728) at
> /pg15.1/src/backend/replication/logical/tablesync.c:1353
Because the logical rep code explicitly prevents interrupts:
/*
* Create a new permanent logical decoding slot. This slot will be used
* for the catchup phase after COPY is done, so tell it to use the
* snapshot to make the final data consistent.
*
* Prevent cancel/die interrupts while creating slot here because it is
* possible that before the server finishes this command, a concurrent
* drop subscription happens which would complete without removing this
* slot leading to a dangling slot on the server.
*/
HOLD_INTERRUPTS();
walrcv_create_slot(LogRepWorkerWalRcvConn,
slotname, false /* permanent */ , false /* two_phase */ ,
CRS_USE_SNAPSHOT, origin_startpos);
RESUME_INTERRUPTS();
Which is just completely entirely wrong. Independent of this issue even. Not
allowing termination for the duration of command executed over network?
This is from:
commit 6b67d72b604cb913e39324b81b61ab194d94cba0
Author: Amit Kapila <akapila@postgresql.org>
Date: 2021-03-17 08:15:12 +0530
Fix race condition in drop subscription's handling of tablesync slots.
Commit ce0fdbfe97 made tablesync slots permanent and allow Drop
Subscription to drop such slots. However, it is possible that before
tablesync worker could get the acknowledgment of slot creation, drop
subscription stops it and that can lead to a dangling slot on the
publisher. Prevent cancel/die interrupts while creating a slot in the
tablesync worker.
Reported-by: Thomas Munro as per buildfarm
Author: Amit Kapila
Reviewed-by: Vignesh C, Takamichi Osumi
Discussion: https://postgr.es/m/CA+hUKGJG9dWpw1cOQ2nzWU8PHjm=PTraB+KgE5648K9nTfwvxg@mail.gmail.com
But this can't be the right fix.
Greetings,
Andres Freund