Re: UTC4115FATAL: the database system is in recovery mode

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: UTC4115FATAL: the database system is in recovery mode
Дата
Msg-id BANLkTindfxAEGUt+kdqGFBAtRVuoVikwNg@mail.gmail.com
обсуждение исходный текст
Ответ на UTC4115FATAL: the database system is in recovery mode  (Mathew Samuel <Mathew.Samuel@entrust.com>)
Ответы Re: UTC4115FATAL: the database system is in recovery mode  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-bugs
On Fri, May 27, 2011 at 2:26 PM, Mathew Samuel
<Mathew.Samuel@entrust.com> wrote:
> I see the following error as found in pg.log:
> UTC4115FATAL:=A0 the database system is in recovery mode
>
> Actually that message was logged repeatedly for about 4 hours according to
> the logs (I don't have access to the system itself, just the logs).
>
> Leading up to that error were the following in pg.log:
> 2011-03-28 10:44:06 UTC3609LOG:=A0 checkpoints are occurring too frequent=
ly
> (11 seconds apart)
> 2011-03-28 10:44:06 UTC3609HINT:=A0 Consider increasing the configuration
> parameter "checkpoint_segments".
> 2011-03-28 10:44:18 UTC3609LOG:=A0 checkpoints are occurring too frequent=
ly
> (12 seconds apart)
> 2011-03-28 10:44:18 UTC3609HINT:=A0 Consider increasing the configuration
> parameter "checkpoint_segments".
> 2011-03-28 10:44:28 UTC3609LOG:=A0 checkpoints are occurring too frequent=
ly
> (10 seconds apart)
> 2011-03-28 10:44:28 UTC3609HINT:=A0 Consider increasing the configuration
> parameter "checkpoint_segments".
> 2011-03-28 10:44:38 UTC3609LOG:=A0 checkpoints are occurring too frequent=
ly
> (10 seconds apart)
> 2011-03-28 10:44:38 UTC3609HINT:=A0 Consider increasing the configuration
> parameter "checkpoint_segments".
> 2011-03-28 10:44:42 UTC3932ERROR:=A0 canceling statement due to statement
> timeout
> 2011-03-28 10:44:42 UTC3932STATEMENT:=A0 vacuum full analyze _zamboni.sl_=
log_1
> 2011-03-28 10:44:42 UTC3932PANIC:=A0 cannot abort transaction 1827110275,=
 it
> was already committed
> 2011-03-28 10:44:42 UTC3566LOG:=A0 server process (PID 3932) was terminat=
ed by
> signal 6
> 2011-03-28 10:44:42 UTC3566LOG:=A0 terminating any other active server
> processes
> 2011-03-28 10:44:42 UTC13142WARNING:=A0 terminating connection because of
> crash of another server process
> 2011-03-28 10:44:42 UTC13142DETAIL:=A0 The postmaster has commanded this
> server process to roll back the current transaction and exit, because
> another server process exited abnormally and possibly corrupted shared
> memory.
> 2011-03-28 10:44:42 UTC13142HINT:=A0 In a moment you should be able to
> reconnect to the database and repeat your command.
> 2011-03-28 10:44:42 UTC29834WARNING:=A0 terminating connection because of
> crash of another server process
> 2011-03-28 10:44:42 UTC29834DETAIL:=A0 The postmaster has commanded this
> server process to roll back the current transaction and exit, because
> another server process exited abnormally and possibly corrupted shared
> memory.
> 2011-03-28 10:44:42 UTC29834HINT:=A0 In a moment you should be able to
> reconnect to the database and repeat your command.
> 2011-03-28 10:44:42 UTC3553WARNING:=A0 terminating connection because of =
crash
> of another server process
> 2011-03-28 10:44:42 UTC3553DETAIL:=A0 The postmaster has commanded this s=
erver
> process to roll back the current transaction and exit, because another
> server process exited abnormally and possibly corrupted shared memory.
> In fact those last 3 lines are repeated over and over again repeatedly un=
til
> "UTC4115FATAL:=A0 the database system is in recovery mode" is logged for 4
> hours. At some point, 4 hours later of course, it appears that the system
> recovers.
>
> The Checkpoints Settings in postgresql.conf are commented out so I guess =
the
> defaults are being used:
> #checkpoint_segments =3D 3=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 #=
 in logfile segments, min 1, 16MB
> each
> #checkpoint_timeout =3D 5min=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 # ran=
ge 30s-1h
> #checkpoint_warning =3D 30s=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 # 0=
 is off
> That system where this was seen was using pgsql-8.2.6 on RHEL4.
>
> Not sure if this is a known bug (or if it is a bug at all or something I =
can
> address using different configuration) but I thought I would post here fi=
rst
> if any one might be familiar with this issue and suggest a possible
> solution. Any ideas?

Not really, although I've seen a few other crashes attributable to
trying to cancel VACUUM FULL.  As of PostgreSQL 9.0, VACUUM FULL has
been completely rewritten, so whatever bug is causing this problem is
almost certainly gone in more recent versions.  It would be nice to
track done and fix whatever is wrong in the older branches, but we'd
probably need a reproducible test case of some sort.  I don't think it
craps out like that EVERY time you interrupt it...

--=20
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Alexey Klyukin
Дата:
Сообщение: Re: BUG #6048: TRUNCATE vs TRUNCATE CASCADE: misleading message
Следующее
От: Robert Haas
Дата:
Сообщение: Re: Unlogged tables cannot be truncated twice