Re: UTC4115FATAL: the database system is in recovery mode
От | Robert Haas |
---|---|
Тема | Re: UTC4115FATAL: the database system is in recovery mode |
Дата | |
Msg-id | BANLkTindfxAEGUt+kdqGFBAtRVuoVikwNg@mail.gmail.com обсуждение исходный текст |
Ответ на | UTC4115FATAL: the database system is in recovery mode (Mathew Samuel <Mathew.Samuel@entrust.com>) |
Ответы |
Re: UTC4115FATAL: the database system is in recovery mode
(Tom Lane <tgl@sss.pgh.pa.us>)
|
Список | pgsql-bugs |
On Fri, May 27, 2011 at 2:26 PM, Mathew Samuel <Mathew.Samuel@entrust.com> wrote: > I see the following error as found in pg.log: > UTC4115FATAL:=A0 the database system is in recovery mode > > Actually that message was logged repeatedly for about 4 hours according to > the logs (I don't have access to the system itself, just the logs). > > Leading up to that error were the following in pg.log: > 2011-03-28 10:44:06 UTC3609LOG:=A0 checkpoints are occurring too frequent= ly > (11 seconds apart) > 2011-03-28 10:44:06 UTC3609HINT:=A0 Consider increasing the configuration > parameter "checkpoint_segments". > 2011-03-28 10:44:18 UTC3609LOG:=A0 checkpoints are occurring too frequent= ly > (12 seconds apart) > 2011-03-28 10:44:18 UTC3609HINT:=A0 Consider increasing the configuration > parameter "checkpoint_segments". > 2011-03-28 10:44:28 UTC3609LOG:=A0 checkpoints are occurring too frequent= ly > (10 seconds apart) > 2011-03-28 10:44:28 UTC3609HINT:=A0 Consider increasing the configuration > parameter "checkpoint_segments". > 2011-03-28 10:44:38 UTC3609LOG:=A0 checkpoints are occurring too frequent= ly > (10 seconds apart) > 2011-03-28 10:44:38 UTC3609HINT:=A0 Consider increasing the configuration > parameter "checkpoint_segments". > 2011-03-28 10:44:42 UTC3932ERROR:=A0 canceling statement due to statement > timeout > 2011-03-28 10:44:42 UTC3932STATEMENT:=A0 vacuum full analyze _zamboni.sl_= log_1 > 2011-03-28 10:44:42 UTC3932PANIC:=A0 cannot abort transaction 1827110275,= it > was already committed > 2011-03-28 10:44:42 UTC3566LOG:=A0 server process (PID 3932) was terminat= ed by > signal 6 > 2011-03-28 10:44:42 UTC3566LOG:=A0 terminating any other active server > processes > 2011-03-28 10:44:42 UTC13142WARNING:=A0 terminating connection because of > crash of another server process > 2011-03-28 10:44:42 UTC13142DETAIL:=A0 The postmaster has commanded this > server process to roll back the current transaction and exit, because > another server process exited abnormally and possibly corrupted shared > memory. > 2011-03-28 10:44:42 UTC13142HINT:=A0 In a moment you should be able to > reconnect to the database and repeat your command. > 2011-03-28 10:44:42 UTC29834WARNING:=A0 terminating connection because of > crash of another server process > 2011-03-28 10:44:42 UTC29834DETAIL:=A0 The postmaster has commanded this > server process to roll back the current transaction and exit, because > another server process exited abnormally and possibly corrupted shared > memory. > 2011-03-28 10:44:42 UTC29834HINT:=A0 In a moment you should be able to > reconnect to the database and repeat your command. > 2011-03-28 10:44:42 UTC3553WARNING:=A0 terminating connection because of = crash > of another server process > 2011-03-28 10:44:42 UTC3553DETAIL:=A0 The postmaster has commanded this s= erver > process to roll back the current transaction and exit, because another > server process exited abnormally and possibly corrupted shared memory. > In fact those last 3 lines are repeated over and over again repeatedly un= til > "UTC4115FATAL:=A0 the database system is in recovery mode" is logged for 4 > hours. At some point, 4 hours later of course, it appears that the system > recovers. > > The Checkpoints Settings in postgresql.conf are commented out so I guess = the > defaults are being used: > #checkpoint_segments =3D 3=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 #= in logfile segments, min 1, 16MB > each > #checkpoint_timeout =3D 5min=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 # ran= ge 30s-1h > #checkpoint_warning =3D 30s=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 # 0= is off > That system where this was seen was using pgsql-8.2.6 on RHEL4. > > Not sure if this is a known bug (or if it is a bug at all or something I = can > address using different configuration) but I thought I would post here fi= rst > if any one might be familiar with this issue and suggest a possible > solution. Any ideas? Not really, although I've seen a few other crashes attributable to trying to cancel VACUUM FULL. As of PostgreSQL 9.0, VACUUM FULL has been completely rewritten, so whatever bug is causing this problem is almost certainly gone in more recent versions. It would be nice to track done and fix whatever is wrong in the older branches, but we'd probably need a reproducible test case of some sort. I don't think it craps out like that EVERY time you interrupt it... --=20 Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
В списке pgsql-bugs по дате отправления:
Предыдущее
От: Alexey KlyukinДата:
Сообщение: Re: BUG #6048: TRUNCATE vs TRUNCATE CASCADE: misleading message