Обсуждение: BUG #17954: Postgres startup fails with `could not locate a valid checkpoint record`
BUG #17954: Postgres startup fails with `could not locate a valid checkpoint record`
От
PG Bug reporting form
Дата:
The following bug has been logged on the website: Bug reference: 17954 Logged by: Utkarsh Srivastava Email address: srivastavautkarsh8097@gmail.com PostgreSQL version: 12.12 Operating system: RHEL/Linux Description: Hi everyone, Thank you for your time. We are running PostgreSQL 12.12 in a CRI-O container on top of CephFS. A few days ago we noticed that DB startup was failing with the following error: ``` 2023-05-14 05:13:13.678 UTC [1] LOG: received smart shutdown request 2023-05-14 05:13:36.692 UTC [1] LOG: could not open file "postmaster.pid": No such file or directory 2023-05-14 05:13:36.692 UTC [1] LOG: performing immediate shutdown because data directory lock file is invalid 2023-05-14 05:13:36.692 UTC [1] LOG: received immediate shutdown request 2023-05-14 05:13:36.692 UTC [1] LOG: could not open file "postmaster.pid": No such file or directory 2023-05-14 05:13:36.692 UTC [261282] WARNING: terminating connection because of crash of another server process 2023-05-14 05:13:36.692 UTC [261282] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 2023-05-14 05:13:36.692 UTC [261282] HINT: In a moment you should be able to reconnect to the database and repeat your command. < --- Trimmed repetition ---> 2023-05-14 05:13:36.739 UTC [1] LOG: database system is shut down 2023-05-14 05:13:37.723 UTC [24] LOG: database system was shut down at 2023-05-14 05:13:17 UTC 2023-05-14 05:13:37.723 UTC [24] LOG: invalid resource manager ID 101 at 9/8BF289E8 2023-05-14 05:13:37.723 UTC [24] LOG: invalid primary checkpoint record 2023-05-14 05:13:37.723 UTC [24] PANIC: could not locate a valid checkpoint record 2023-05-14 05:13:39.961 UTC [22] LOG: startup process (PID 24) was terminated by signal 6: Aborted 2023-05-14 05:13:39.961 UTC [22] LOG: aborting startup due to startup process failure 2023-05-14 05:13:40.117 UTC [22] LOG: database system is shut down 2023-05-14 05:14:06.726 UTC [24] LOG: database system was shut down at 2023-05-14 05:13:17 UTC 2023-05-14 05:14:06.726 UTC [24] LOG: invalid resource manager ID 101 at 9/8BF289E8 ``` - What could be the root cause of this issue? - Is this a known issue (I did search the archives but couldn't find it though)? If yes, is this fixed in a PG 13, 14, 15? Thank you
Re: BUG #17954: Postgres startup fails with `could not locate a valid checkpoint record`
От
Michael Paquier
Дата:
On Thu, Jun 01, 2023 at 01:11:20PM +0000, PG Bug reporting form wrote: > - What could be the root cause of this issue? > - Is this a known issue (I did search the archives but couldn't find it > though)? If yes, is this fixed in a PG 13, 14, 15? Hard to say for sure, but it looks like your host has a few problems. This part from your logs refers to something that should not happen, to begin with: > 2023-05-14 05:13:13.678 UTC [1] LOG: received smart shutdown request > 2023-05-14 05:13:36.692 UTC [1] LOG: could not open file "postmaster.pid": > No such file or directory > 2023-05-14 05:13:36.692 UTC [1] LOG: performing immediate shutdown because > data directory lock file is invalid > 2023-05-14 05:13:36.692 UTC [1] LOG: received immediate shutdown request > 2023-05-14 05:13:36.692 UTC [1] LOG: could not open file "postmaster.pid": > No such file or directory This LOG would come from either AddToDataDirLockFile() or RecheckDataDirLockFile(). Still, the third entry I am quoting refers to a recheck of the PID file, meaning that the postmaster has bumped into what looks like a corrupted PID file. -- Michael