Обсуждение: server crash

Поиск
Список
Период
Сортировка

server crash

От
"Medora Schauer"
Дата:

OS: linux 2.4.18

h/w: moto 5100 SBC

PG: 7.3

 

We have a recurring problem with the pg server crashing.  It doesn’t happen often (once every couple of months or so) but we require 24/7 up time.  Attached is the  pg.log from the last time it crashed.  The stuff towards the end seems to indicate some kind of serious problem but beyond that it means little to me.  I’m hoping it will be more meaningful to someone on this list.  Suggestions about how to obtain more information would also be welcome.

 

Is there any way to get timestamps in the log file?  It would be helpful to know which of the messages are related in a time sense at least.

 

Is there a description of the various postmaster debug levels?  I might try upping the level but since the crash doesn’t happen very often I would be helpful if I knew which level would yield the most useful information without clogging the file with unneeded detail.

 

Any help will be greatly appreciated,

 

Medora Schauer

 

Вложения

Re: server crash

От
Tom Lane
Дата:
"Medora Schauer" <mschauer@fairfield.com> writes:
> PG: 7.3

I hope you meant "7.3.8", or at least something later than original 7.3 ;-)

> PANIC:  link from /data/database/pg_xlog/00000001000000D9 to /data/database/pg_xlog/00000001000000E1 (initialization
oflog file 1, segment 225) failed: No such file or directory 

AFAICS, the only way that that message could come out is if
a readdir() scan of /data/database/pg_xlog found a file named
00000001000000D9 but then an immediately following link() call
didn't find it.  This suggests filesystem corruption to me;
it might be worth running fsck.  Also check whether your kernel
is reasonably up to date, as it could be a kernel bug.

The fact that the subsequent startup process failed again in just the
same way seems to eliminate most of the theories that would suggest
it's a Postgres bug.  In particular, it isn't a race condition such as
two processes trying to delete the same file at the same time, because
the recovery startup sequence does no parallel processing.

            regards, tom lane