Обсуждение: BUG #13128: Postgres deadlock on startup failure when max_prepared_transactions is not sufficiently high.
The following bug has been logged on the website: Bug reference: 13128 Logged by: Grant McAlister Email address: grant@amazon.com PostgreSQL version: 9.3.6 Operating system: Linux 64bit Description: Steps to reproduce 1. Set max_prepared_transactions to 2 in postgresql.conf. 2. start Postgres. 3. Create two uncommitted prepared transactions: BEGIN; PREPARE TRANSACTION âtest1â; BEGIN; PREPARE TRANSACTION âtest2â; 4. Set max_prepared_transactions to 1 in postgresql.conf. 5. Restart Postgres again. At this point the startup will fail with a fatal but the postmaster process keeps running. LOG: database system was interrupted; last known up at 2015-04-16 17:19:56 PDT LOG: database system was not properly shut down; automatic recovery in progress LOG: record with zero length at 0/1826C70 LOG: redo is not required LOG: recovering prepared transaction 1685 LOG: recovering prepared transaction 1683 FATAL: maximum number of prepared transactions reached HINT: Increase max_prepared_transactions (currently 1). Looks like their may be a LWLock that is not correctly getting released which causes the process to hang rather then exit.
On Thu, Apr 23, 2015 at 7:09 AM, <grant@amazon.com> wrote: > 1. Set max_prepared_transactions to 2 in postgresql.conf. > 2. start Postgres. > 3. Create two uncommitted prepared transactions: BEGIN; PREPARE > TRANSACTION 'test1'; BEGIN; PREPARE TRANSACTION 'test2'; > 4. Set max_prepared_transactions to 1 in postgresql.conf. > 5. Restart Postgres again. > > At this point the startup will fail with a fatal but the postmaster process > keeps running. > > LOG: database system was interrupted; last known up at 2015-04-16 17:19:56 > PDT > LOG: database system was not properly shut down; automatic recovery in > progress > LOG: record with zero length at 0/1826C70 > LOG: redo is not required > LOG: recovering prepared transaction 1685 > LOG: recovering prepared transaction 1683 > FATAL: maximum number of prepared transactions reached > HINT: Increase max_prepared_transactions (currently 1). > > Looks like their may be a LWLock that is not correctly getting released > which causes the process to hang rather then exit. Yep, the startup process remains stuck here, so we should release the lock before issuing ERROR in twophase.c: * frame #0: 0x00007fff95004b72 libsystem_kernel.dylib`semop + 10 frame #1: 0x000000010f59530f postgres`PGSemaphoreLock(sema=0x00000001187a1190) + 63 at pg_sema.c:387 frame #2: 0x000000010f632af9 postgres`LWLockAcquireCommon(lock=0x000000010fc452e0, mode=LW_EXCLUSIVE, valptr=0x0000000000000000, val=0) + 377 at lwlock.c:1037 frame #3: 0x000000010f63296b postgres`LWLockAcquire(l=0x000000010fc452e0, mode=LW_EXCLUSIVE) + 43 at lwlock.c:900 frame #4: 0x000000010f2f975d postgres`AtAbort_Twophase + 93 at twophase.c:284 frame #5: 0x000000010f2f9f24 postgres`AtProcExit_Twophase(code=1, arg=0) + 20 at twophase.c:246 The patch attached fixes the problem for me, and I think that this should be backpatched as well. (Note to self: check the other calls of LWLockAcquire/Release to see if there are other code paths in the same situation). Regards, -- Michael
Вложения
Re: BUG #13128: Postgres deadlock on startup failure when max_prepared_transactions is not sufficiently high.
От
Heikki Linnakangas
Дата:
On 04/23/2015 04:20 AM, Michael Paquier wrote: > On Thu, Apr 23, 2015 at 7:09 AM, <grant@amazon.com> wrote: >> 1. Set max_prepared_transactions to 2 in postgresql.conf. >> 2. start Postgres. >> 3. Create two uncommitted prepared transactions: BEGIN; PREPARE >> TRANSACTION 'test1'; BEGIN; PREPARE TRANSACTION 'test2'; >> 4. Set max_prepared_transactions to 1 in postgresql.conf. >> 5. Restart Postgres again. >> >> At this point the startup will fail with a fatal but the postmaster process >> keeps running. >> >> LOG: database system was interrupted; last known up at 2015-04-16 17:19:56 >> PDT >> LOG: database system was not properly shut down; automatic recovery in >> progress >> LOG: record with zero length at 0/1826C70 >> LOG: redo is not required >> LOG: recovering prepared transaction 1685 >> LOG: recovering prepared transaction 1683 >> FATAL: maximum number of prepared transactions reached >> HINT: Increase max_prepared_transactions (currently 1). >> >> Looks like their may be a LWLock that is not correctly getting released >> which causes the process to hang rather then exit. > > Yep, the startup process remains stuck here, so we should release the > lock before issuing ERROR in twophase.c: As Tom noted in the other thread, that is not generally required. LWLockReleaseAll() is call during process exit. Hmm. What happens here is that when the startup process is about to exit, because of the ERROR, it calls the shmem-exit hooks. AtProcExit_Twophase sees that MyLockedGxact != NULL, and it tries to clear MyLockedGxact->locking_backend to release the entry. That's bogus. MyLockedGxact != NULL indicates that the backend is currently operating on the entry. But that's not the case at that point during RecoverPreparedTransactions(). It has already completed recovering the previous transaction, and has not yet locked a GlobalTransaction entry for the next one. RecoverPreparedTransactions() should clear MyLockedGxact after it has recovered the transaction, after the StandbyReleaseLockTree() call. That's analogous to the normal PREPARE TRANSACTION path, in PrepareTransaction(), where we call PostPrepare_Twophase() after releasing the locks. Attached is a patch for that. I introduced this bug in commit bb38fb0d43c8d7ff54072bfd8bd63154e536b384, which added the proc-exit hook and the call PostPrepare_Twophase(), so unless you see more bugs in this, I'll commit and backpatch this to all versions. - Heikki
Вложения
Re: BUG #13128: Postgres deadlock on startup failure when max_prepared_transactions is not sufficiently high.
От
Heikki Linnakangas
Дата:
On 04/23/2015 10:32 AM, Heikki Linnakangas wrote: > Attached is a patch for that. I introduced this bug in commit > bb38fb0d43c8d7ff54072bfd8bd63154e536b384, which added the proc-exit hook > and the call PostPrepare_Twophase(), so unless you see more bugs in > this, I'll commit and backpatch this to all versions. Committed. - Heikki