Обсуждение: Fail to connect after server crash

Поиск
Список
Период
Сортировка

Fail to connect after server crash

От
Kyle Wilcox
Дата:
While running some programs overnight to populate the database, it
crashed.  I now can not start the database server.  I have a backup of
the database, but to restore I will need to connect...  any help is much
appreciated.

Here is the log from the crash (looks like I ran out of disk space?
there is over 200gb free space...) :

2008-01-04 00:56:34 ERROR:  could not extend relation 1663/42463/47343:
Permission denied
2008-01-04 00:56:34 HINT:  Check free disk space.
2008-01-04 00:56:34 STATEMENT:  INSERT INTO mdm.archive_data
(utctime,value,variableid,grid_id,initial_time) values (TIMESTAMP
'2003-09-23 07:45:00',33.000000,6,17997,5954)
2008-01-04 00:56:34 ERROR:  could not read block 59762 of relation
1663/42463/47343: Permission denied
2008-01-04 00:56:34 STATEMENT:  INSERT INTO mdm.archive_data
(utctime,value,variableid,grid_id,initial_time) values (TIMESTAMP
'2003-09-23 07:45:00',33.000000,7,17997,5954)
2008-01-04 00:56:34 ERROR:  could not extend relation 1663/42463/47343:
Permission denied
2008-01-04 00:56:34 HINT:  Check free disk space.
2008-01-04 00:56:34 STATEMENT:  INSERT INTO mdm.archive_data
(utctime,value,variableid,grid_id,initial_time) values (TIMESTAMP
'2003-09-23 07:45:00',-0.340699,8,17997,5954)
2008-01-04 00:56:34 ERROR:  could not read block 59762 of relation
1663/42463/47343: Permission denied
2008-01-04 00:56:34 STATEMENT:  INSERT INTO mdm.archive_data
(utctime,value,assist,rotation,variableid,grid_id,initial_time) values
(TIMESTAMP '2003-09-23 07:45:00',0.46,158,158,3,17999,5954)
2008-01-04 00:56:34 ERROR:  could not extend relation 1663/42463/47343:
Permission denied
2008-01-04 00:56:34 HINT:  Check free disk space.
2008-01-04 00:56:34 STATEMENT:  INSERT INTO mdm.archive_data
(utctime,value,rotation,variableid,grid_id,initial_time) values
(TIMESTAMP '2003-09-23 07:45:00',158,158,4,17999,5954)
2008-01-04 00:56:36 ERROR:  failed to re-find parent key in "Uvalue2"
for split pages 148155/148160
2008-01-04 00:56:36 STATEMENT:  INSERT INTO mdm.archive_data
(utctime,value,assist,rotation,variableid,grid_id,initial_time) values
(TIMESTAMP '2003-09-23 07:45:00',0.02,309,309,3,18083,5954)
...
...
...
This application has requested the Runtime to terminate it in an unusual
way.
Please contact the application's support team for more information.
2008-01-04 00:57:36 LOG:  server process (PID 3468) exited with exit code 3
2008-01-04 00:57:36 LOG:  terminating any other active server processes
2008-01-04 00:57:36 WARNING:  terminating connection because of crash of
another server process
2008-01-04 00:57:36 DETAIL:  The postmaster has commanded this server
process to roll back the current transaction and exit, because another
server process exited abnormally and possibly corrupted shared memory.
2008-01-04 00:57:36 HINT:  In a moment you should be able to reconnect
to the database and repeat your command.
2008-01-04 00:57:36 WARNING:  terminating connection because of crash of
another server process
2008-01-04 00:57:36 DETAIL:  The postmaster has commanded this server
process to roll back the current transaction and exit, because another
server process exited abnormally and possibly corrupted shared memory.
2008-01-04 00:57:36 HINT:  In a moment you should be able to reconnect
to the database and repeat your command.
2008-01-04 00:57:36 WARNING:  terminating connection because of crash of
another server process
2008-01-04 00:57:36 DETAIL:  The postmaster has commanded this server
process to roll back the current transaction and exit, because another
server process exited abnormally and possibly corrupted shared memory.
2008-01-04 00:57:36 HINT:  In a moment you should be able to reconnect
to the database and repeat your command.
2008-01-04 00:57:36 WARNING:  terminating connection because of crash of
another server process
2008-01-04 00:57:36 DETAIL:  The postmaster has commanded this server
process to roll back the current transaction and exit, because another
server process exited abnormally and possibly corrupted shared memory.
2008-01-04 00:57:36 HINT:  In a moment you should be able to reconnect
to the database and repeat your command.
2008-01-04 00:57:36 WARNING:  terminating connection because of crash of
another server process
2008-01-04 00:57:36 DETAIL:  The postmaster has commanded this server
process to roll back the current transaction and exit, because another
server process exited abnormally and possibly corrupted shared memory.
2008-01-04 00:57:36 HINT:  In a moment you should be able to reconnect
to the database and repeat your command.
2008-01-04 00:57:36 LOG:  all server processes terminated; reinitializing
2008-01-04 00:57:36 LOG:  database system was interrupted at 2008-01-04
00:56:43 Eastern Standard Time
2008-01-04 00:57:36 LOG:  checkpoint record is at 8/C2F23AC8
2008-01-04 00:57:36 LOG:  redo record is at 8/C2EFD818; undo record is
at 0/0; shutdown FALSE
2008-01-04 00:57:36 LOG:  next transaction ID: 0/29414907; next OID: 55512
2008-01-04 00:57:36 LOG:  next MultiXactId: 1; next MultiXactOffset: 0
2008-01-04 00:57:36 LOG:  database system was not properly shut down;
automatic recovery in progress
2008-01-04 00:57:37 LOG:  redo starts at 8/C2EFD818
2008-01-04 00:57:37 LOG:  could not read from log file 8, segment 195,
offset 0: Permission denied
2008-01-04 00:57:37 LOG:  redo done at 8/C2FFFFA0
2008-01-04 00:57:38 FATAL:  failed to re-find parent key in "47340" for
split pages 148155/148160
2008-01-04 00:57:38 LOG:  startup process (PID 5452) exited with exit
code 1
2008-01-04 00:57:38 LOG:  aborting startup due to startup process failure
2008-01-04 00:57:38 LOG:  logger shutting down



Here is the log file if I try to start the database server:

2008-01-04 10:48:48 LOG:  database system was interrupted while in
recovery at 2008-01-04 00:57:36 Eastern Standard Time
2008-01-04 10:48:48 HINT:  This probably means that some data is
corrupted and you will have to use the last backup for recovery.
2008-01-04 10:48:48 LOG:  checkpoint record is at 8/C2F23AC8
2008-01-04 10:48:48 LOG:  redo record is at 8/C2EFD818; undo record is
at 0/0; shutdown FALSE
2008-01-04 10:48:48 LOG:  next transaction ID: 0/29414907; next OID: 55512
2008-01-04 10:48:48 LOG:  next MultiXactId: 1; next MultiXactOffset: 0
2008-01-04 10:48:48 LOG:  database system was not properly shut down;
automatic recovery in progress
2008-01-04 10:48:48 LOG:  redo starts at 8/C2EFD818
2008-01-04 10:48:49 LOG:  record with zero length at 8/C3BBE410
2008-01-04 10:48:49 LOG:  redo done at 8/C3BBE3E0
2008-01-04 10:48:49 FATAL:  failed to re-find parent key in "47340" for
split pages 148155/148160
2008-01-04 10:48:49 LOG:  startup process (PID 4864) exited with exit
code 1
2008-01-04 10:48:49 LOG:  aborting startup due to startup process failure
2008-01-04 10:48:49 LOG:  logger shutting down




--

 Kyle Wilcox
 NOAA Chesapeake Bay Office
 410 Severn Avenue
 Suite 107A
 Annapolis, MD 21403
 office: (410) 295-3151
 Kyle.Wilcox@noaa.gov

 "It is from the wellspring of our despair and the places
  that we are broken that we come to repair the world."
                        - Murray Waas

Re: Fail to connect after server crash

От
Jonathan Ballet
Дата:
Kyle Wilcox wrote:
> Here is the log from the crash (looks like I ran out of disk space?
> there is over 200gb free space...) :
>
> 2008-01-04 00:56:34 ERROR:  could not extend relation 1663/42463/47343:
> Permission denied
   ^^^^^^^^^^^^^^^^^
> 2008-01-04 00:56:34 HINT:  Check free disk space.

You rather have a permission problem, instead of no more disk space.
However, I don't know why :)

  - Jonathan

Re: Fail to connect after server crash

От
"Scott Marlowe"
Дата:
Like Jonathan pointed out, looks like a permission issue.  We see
these things show up on Windows boxes quite often when a virus checker
/ spyware checker kicks in and does something stupid like lock a file
that postgresql needs to have exclusive access to.

Simple answer there is to put postgresql on it's own machine with no
spyware detection / anti-virus software on it, or more complexly, tell
it to ignore the directories where postgresql lives.

Re: Fail to connect after server crash

От
Kyle Wilcox
Дата:
Checked some logs and a backup job was running at the same time as the
insertions.  Must have been the backup job locking a file.  Is there a
way to get the server to restart or should I reinstall?



Scott Marlowe wrote:
> Like Jonathan pointed out, looks like a permission issue.  We see
> these things show up on Windows boxes quite often when a virus checker
> / spyware checker kicks in and does something stupid like lock a file
> that postgresql needs to have exclusive access to.
>
> Simple answer there is to put postgresql on it's own machine with no
> spyware detection / anti-virus software on it, or more complexly, tell
> it to ignore the directories where postgresql lives.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings

--

 Kyle Wilcox
 NOAA Chesapeake Bay Office
 410 Severn Avenue
 Suite 107A
 Annapolis, MD 21403
 office: (410) 295-3151
 Kyle.Wilcox@noaa.gov

 "It is from the wellspring of our despair and the places
  that we are broken that we come to repair the world."
                        - Murray Waas

Re: Fail to connect after server crash

От
"Scott Marlowe"
Дата:
On Jan 4, 2008 11:46 AM, Kyle Wilcox <Kyle.Wilcox@noaa.gov> wrote:
> Checked some logs and a backup job was running at the same time as the
> insertions.  Must have been the backup job locking a file.  Is there a
> way to get the server to restart or should I reinstall?

Seeing as your backup program appears to have corrupted your database,
I'd say blowing away the PGDATA dir and running a new initdb is the
answer (not reinstalling).

Then you need to disable your backup job that caused this problem.
And shoot the person who set it up.  Or not.  That last step is
strictly optional.  :-)