Обсуждение: SIGSEGV when trying to start in single user mode
Hello List, I have a problem with my PostgreSQL 8.3.4 installation. We had some problems with our storage subsystem and it seems postgresql suffered a little bit from it. Here are some log excerpts: When trying to start postgesql: --- # /etc/init.d/postgresql-8.3 start Starting PostgreSQL 8.3 database server: main* Removed stale pid file. The PostgreSQL server failed to start. Please check the log output: 2009-09-19 16:51:00 CEST LOG: could not load root certificate file "root.crt": no SSL error reported 2009-09-19 16:51:00 CEST DETAIL: Will not verify client certificates. 2009-09-19 16:51:00 CEST LOG: could not create IPv6 socket: Address family not supported by protocol 2009-09-19 16:51:00 CEST LOG: database system was interrupted while in recovery at 2009-09-19 16:47:52 CEST 2009-09-19 16:51:00 CEST HINT: This probably means that some data is corrupted and you will have to use the last backup for recovery. 2009-09-19 16:51:00 CEST LOG: database system was not properly shut down; automatic recovery in progress 2009-09-19 16:51:00 CEST LOG: incomplete startup packet 2009-09-19 16:51:00 CEST LOG: redo starts at 44D/CEAFB200 2009-09-19 16:51:00 CEST LOG: unexpected pageaddr 44D/B8B0A000 in log file 1101, segment 206, offset 11575296 2009-09-19 16:51:00 CEST LOG: redo done at 44D/CEB062C0 2009-09-19 16:51:00 CEST PANIC: right sibling's left-link doesn't match: block 49696 links to 49978 instead of expected 3 in index "132010" 2009-09-19 16:51:00 CEST LOG: startup process (PID 3727) was terminated by signal 6: Aborted 2009-09-19 16:51:00 CEST LOG: aborting startup due to startup process failure failed! --- I think the index is not a system index. But when I tried to start Postgresql in single-user mode to be able to repair this index i am getting the mentioned SIGSEGV. Here is the last part of the strace output: http://pastie.org/622807 Here is the gdb output: --- Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 47930899067328 (LWP 3881)] 0x00000000005a7fc0 in SHMQueueInsertBefore () (gdb) bt #0 0x00000000005a7fc0 in SHMQueueInsertBefore () #1 0x00000000005ac7a0 in LockAcquire () #2 0x00000000005aa416 in LockRelationForExtension () #3 0x0000000000469b0e in _bt_getbuf () #4 0x0000000000467759 in _bt_getstackbuf () #5 0x00000000004687d2 in _bt_insert_parent () #6 0x000000000046fc34 in btree_xlog_cleanup () #7 0x000000000047fbfb in StartupXLOG () #8 0x00000000005b899d in PostgresMain () #9 0x00000000005448df in main () --- Its a debian, and I think there are no debug symbols in the package (gdb announces some "no debugging symbols found") Anyone knows what to do? Thanks in advance, Björn
=?UTF-8?B?QmrDtnJuIEjDpHVzZXI=?= <bjoernhaeuser@googlemail.com> writes: > I have a problem with my PostgreSQL 8.3.4 installation. > We had some problems with our storage subsystem and it seems > postgresql suffered a little bit from it. > Here are some log excerpts: > # /etc/init.d/postgresql-8.3 start > Starting PostgreSQL 8.3 database server: main* Removed stale pid file. You really need to get rid of that startup script, or at least get rid of the part of it that thinks it should remove the postmaster's PID file. That's completely unsafe and poor practice. (I doubt it's related to your immediate problem, though.) > 2009-09-19 16:51:00 CEST PANIC: right sibling's left-link doesn't > match: block 49696 links to 49978 instead of expected 3 in index > "132010" > 2009-09-19 16:51:00 CEST LOG: startup process (PID 3727) was > terminated by signal 6: Aborted Ugh, so you have a corrupted index that is touched by the unreplayed WAL sequence. I'm afraid the only easy way out of this is to use pg_resetxlog, which is a bit risky since you'll lose whatever other changes haven't been applied to the database. Probably the safest thing to do is pg_resetxlog, start up, dump everything, initdb, reload. > But when I tried to start Postgresql in single-user mode to be able to > repair this index i am getting the mentioned SIGSEGV. Hmm, that's a bug, but even if it weren't broken it would not help you. A single-user backend still has to replay any unreplayed WAL, so it would still hit the PANIC. regards, tom lane
Hello Tom, thank you for your help. I resetted the xlog and the server started again. Regards, Björn 2009/9/19 Tom Lane <tgl@sss.pgh.pa.us>: > =?UTF-8?B?QmrDtnJuIEjDpHVzZXI=?= <bjoernhaeuser@googlemail.com> writes: >> I have a problem with my PostgreSQL 8.3.4 installation. > >> We had some problems with our storage subsystem and it seems >> postgresql suffered a little bit from it. >> Here are some log excerpts: > >> # /etc/init.d/postgresql-8.3 start >> Starting PostgreSQL 8.3 database server: main* Removed stale pid file. > > You really need to get rid of that startup script, or at least get rid > of the part of it that thinks it should remove the postmaster's PID > file. That's completely unsafe and poor practice. (I doubt it's > related to your immediate problem, though.) > >> 2009-09-19 16:51:00 CEST PANIC: right sibling's left-link doesn't >> match: block 49696 links to 49978 instead of expected 3 in index >> "132010" >> 2009-09-19 16:51:00 CEST LOG: startup process (PID 3727) was >> terminated by signal 6: Aborted > > Ugh, so you have a corrupted index that is touched by the unreplayed > WAL sequence. I'm afraid the only easy way out of this is to use > pg_resetxlog, which is a bit risky since you'll lose whatever other > changes haven't been applied to the database. Probably the safest > thing to do is pg_resetxlog, start up, dump everything, initdb, > reload. > >> But when I tried to start Postgresql in single-user mode to be able to >> repair this index i am getting the mentioned SIGSEGV. > > Hmm, that's a bug, but even if it weren't broken it would not help you. > A single-user backend still has to replay any unreplayed WAL, so it > would still hit the PANIC. > > regards, tom lane >