Re: DROP DATABASE deadlocks with logical replication worker in PG 15.1

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: DROP DATABASE deadlocks with logical replication worker in PG 15.1
Дата
Msg-id CAA4eK1JOi49nVd5TbcMvVErSAM=jV9mbQD_kLNx2vxfaJmbZGQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: DROP DATABASE deadlocks with logical replication worker in PG 15.1  ("Andrey M. Borodin" <x4mmm@yandex-team.ru>)
Список pgsql-bugs
On Tue, Aug 1, 2023 at 6:45 PM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
>
> > On 24 Jan 2023, at 10:35, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Sat, Jan 21, 2023 at 5:06 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >>
> >>
> >> I have verified it and it looks good to me. I'll push and backpatch
> >> this till PG14 on Tuesday unless there are more comments/suggestions.
> >>
> >
> > Pushed.
>
> Hi!
>
> I've observed similar problem on one of our production clusters. Unfortunately, I barely managed to capture some
debuginformation. Cluster was deleted right after I started investigating the problem. 
>
> PG 15.3.
>
> "Drop database" was hanging in
> (gdb) bt
> #0  0x00007f215baa7907 in epoll_wait (epfd=13, events=0x563a71d25278, maxevents=1, timeout=timeout@entry=5000) at
../sysdeps/unix/sysv/linux/epoll_wait.c:30
> #1  0x0000563a713888db in WaitEventSetWaitBlock (nevents=1, occurred_events=0x7fffe77d40b0, cur_timeout=5000,
set=0x563a71d25218)at ./build/../src/backend/storage/ipc/latch.c:1489 
> #2  WaitEventSetWait (set=0x563a71d25218, timeout=timeout@entry=5000,
occurred_events=occurred_events@entry=0x7fffe77d4100,nevents=nevents@entry=1,
wait_event_info=wait_event_info@entry=134217765)
>     at ./build/../src/backend/storage/ipc/latch.c:1435
> #3  0x0000563a71388c88 in WaitLatch (latch=<optimized out>, wakeEvents=wakeEvents@entry=41,
timeout=timeout@entry=5000,wait_event_info=wait_event_info@entry=134217765) 
>     at ./build/../src/backend/storage/ipc/latch.c:497
> #4  0x0000563a713951f6 in ConditionVariableTimedSleep (cv=cv@entry=0x7f2150627f1c, timeout=timeout@entry=5000,
wait_event_info=wait_event_info@entry=134217765)
>     at ./build/../src/backend/storage/lmgr/condition_variable.c:163
> #5  0x0000563a7138e93b in WaitForProcSignalBarrier (generation=2) at
./build/../src/backend/storage/ipc/procsignal.c:415
> #6  0x0000563a711b2339 in dropdb (dbname=<optimized out>, missing_ok=<optimized out>, force=<optimized out>) at
./build/../src/backend/commands/dbcommands.c:1690
> #7  0x0000563a713b3ef3 in standard_ProcessUtility (pstmt=pstmt@entry=0x563a71d2ef18,
queryString=queryString@entry=0x563a71d2e218"DROP DATABASE \"db1\"", readOnlyTree=<optimized out>, 
>     context=context@entry=PROCESS_UTILITY_TOPLEVEL, params=params@entry=0x0, queryEnv=queryEnv@entry=0x0,
dest=0x563a71d2eff8,qc=0x7fffe77d4860) at ./build/../src/backend/tcop/utility.c:797 
> #8  0x00007f2154894443 in pgss_ProcessUtility (pstmt=0x563a71d2ef18, queryString=0x563a71d2e218 "DROP DATABASE
\"db1\"",readOnlyTree=<optimized out>, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, 
>     queryEnv=0x0, dest=0x563a71d2eff8, qc=0x7fffe77d4860) at
./build/../contrib/pg_stat_statements/pg_stat_statements.c:1206
> #9  0x0000563a713b2015 in PortalRunUtility (portal=portal@entry=0x563a71e1af58, pstmt=pstmt@entry=0x563a71d2ef18,
isTopLevel=isTopLevel@entry=true,setHoldSnapshot=setHoldSnapshot@entry=false, 
>     dest=0x563a71d2eff8, qc=0x7fffe77d4860) at ./build/../src/backend/tcop/pquery.c:1158
> #10 0x0000563a713b2140 in PortalRunMulti (portal=portal@entry=0x563a71e1af58, isTopLevel=isTopLevel@entry=true,
setHoldSnapshot=setHoldSnapshot@entry=false,dest=dest@entry=0x563a71d2eff8, 
>     altdest=altdest@entry=0x563a71d2eff8, qc=qc@entry=0x7fffe77d4860) at ./build/../src/backend/tcop/pquery.c:1322
> #11 0x0000563a713b25cb in PortalRun (portal=0x563a71e1af58, count=9223372036854775807, isTopLevel=<optimized out>,
run_once=<optimizedout>, dest=0x563a71d2eff8, altdest=0x563a71d2eff8, qc=0x7fffe77d4860) 
>     at ./build/../src/backend/tcop/pquery.c:791
> #12 0x0000563a71054f5e in exec_simple_query (query_string=0x563a71d2e218 "DROP DATABASE \"db1\"") at
./build/../src/backend/tcop/postgres.c:1250
> #13 0x0000563a713b0c64 in PostgresMain (dbname=<optimized out>, username=<optimized out>) at
./build/../src/backend/tcop/postgres.c:4593
> #14 0x0000563a7132385a in BackendRun (port=<optimized out>, port=<optimized out>) at
./build/../src/backend/postmaster/postmaster.c:4518
> #15 BackendStartup (port=<optimized out>) at ./build/../src/backend/postmaster/postmaster.c:4246
> #16 ServerLoop () at ./build/../src/backend/postmaster/postmaster.c:1806
> #17 0x0000563a7132478c in PostmasterMain (argc=9, argv=<optimized out>) at
./build/../src/backend/postmaster/postmaster.c:1478
> #18 0x0000563a7105607b in main (argc=9, argv=0x563a71d240e0) at ./build/../src/backend/main/main.c:202
> (gdb) select-frame 5
> (gdb) p ProcSignal->psh_slot[i]
> $1 = {pss_pid = 9131, pss_signalFlags = {0 <repeats 12 times>}, pss_barrierGeneration = {value = 0},
pss_barrierCheckMask= {value = 1}, pss_barrierCV = {mutex = 0 '\000', wakeup = {head = 1565, tail = 1565}}} 
> (gdb) p generation
> $2 = 2
>
>
> There were 3 inactive logical replication slots with disabled interrupts and bt like this
> 0x00007f215baa7907 in epoll_wait (epfd=6, events=0x563a71d26150, maxevents=1, timeout=timeout@entry=-1) at
../sysdeps/unix/sysv/linux/epoll_wait.c:30
> #0  0x00007f215baa7907 in epoll_wait (epfd=6, events=0x563a71d26150, maxevents=1, timeout=timeout@entry=-1) at
../sysdeps/unix/sysv/linux/epoll_wait.c:30
> #1  0x0000563a713888db in WaitEventSetWaitBlock (nevents=1, occurred_events=0x1, cur_timeout=-1, set=0x563a71d260d8)
at./build/../src/backend/storage/ipc/latch.c:1489 
> #2  WaitEventSetWait (set=0x563a71d260d8, timeout=timeout@entry=-1,
occurred_events=occurred_events@entry=0x7fffe77d4850,nevents=nevents@entry=1,
wait_event_info=wait_event_info@entry=100663296)at ./build/../src/backend/storage/ipc/latch.c:1435 
> #3  0x0000563a71287747 in secure_read (port=0x563a71d58780, ptr=0x563a719c7140 <PqRecvBuffer>, len=8192) at
./build/../src/backend/libpq/be-secure.c:186
> #4  0x0000563a7128e463 in pq_recvbuf () at ./build/../src/backend/libpq/pqcomm.c:957
> #5  0x0000563a7128f495 in pq_getbyte () at ./build/../src/backend/libpq/pqcomm.c:1000
> #6  0x0000563a713aed02 in SocketBackend (inBuf=0x7fffe77d4a30) at ./build/../src/backend/tcop/postgres.c:351
> #7  ReadCommand (inBuf=0x7fffe77d4a30) at ./build/../src/backend/tcop/postgres.c:474
> #8  PostgresMain (dbname=<optimized out>, username=<optimized out>) at ./build/../src/backend/tcop/postgres.c:4525
> #9  0x0000563a7132385a in BackendRun (port=<optimized out>, port=<optimized out>) at
./build/../src/backend/postmaster/postmaster.c:4518
> #10 BackendStartup (port=<optimized out>) at ./build/../src/backend/postmaster/postmaster.c:4246
> #11 ServerLoop () at ./build/../src/backend/postmaster/postmaster.c:1806
> #12 0x0000563a7132478c in PostmasterMain (argc=9, argv=<optimized out>) at
./build/../src/backend/postmaster/postmaster.c:1478
> #13 0x0000563a7105607b in main (argc=9, argv=0x563a71d240e0) at ./build/../src/backend/main/main.c:202
>
> in pg_stat_activity query was like "create table...", but backend type was walsender and it was idle.
>

Does this mean that Drop Database is waiting for walsender instead of
tablesync worker? Can you show the pg_stat_activity row of backend for
which the Drop Database command is waiting?

--
With Regards,
Amit Kapila.



В списке pgsql-bugs по дате отправления:

Предыдущее
От: PG Bug reporting form
Дата:
Сообщение: BUG #18046: stats collection behaviour change is affecting the usability of information.
Следующее
От: "Zhijie Hou (Fujitsu)"
Дата:
Сообщение: RE: DROP DATABASE deadlocks with logical replication worker in PG 15.1