Обсуждение: Startup proc 30595 exited with status 512 - abort and FATAL 2: XLogFlush

Поиск
Список
Период
Сортировка

Startup proc 30595 exited with status 512 - abort and FATAL 2: XLogFlush

От
"Wang, Mary Y"
Дата:
Hi,
I'm having a bad day. My Postgresql has this error "FATAL 2:  XLogFlush: request is not satisfied".   I tried to follow
theinstructions from a thread about looking for a core dump, but when I tried to start the postmaster, I got
"/usr/bin/postmaster:Startup proc 30595 exited with status 512 - abort". 

How do I find the corrupted record?  If that is the case.

I'm pg version is postgresql-7.1.3-2.  What are my options?  I read some where that I can pg_resetxlog, but I can't
evenfind it in the path. 

This is from the log file:

********************************************************************************************************************************
DEBUG:  database system was shut down at 2010-02-01 19:24:40 PST
DEBUG:  CheckPoint record at (2, 4173828852)
DEBUG:  Redo record at (2, 4173828852); Undo record at (0, 0); Shutdown TRUE
DEBUG:  NextTransactionId: 79259759; NextOid: 4395896
DEBUG:  database system is in production state
Smart Shutdown request at Mon Feb  1 20:08:59 2010
DEBUG:  shutting down
DEBUG:  database system is shut down
DEBUG:  database system shutdown was interrupted at 2010-02-02 07:54:51 PST
DEBUG:  CheckPoint record at (2, 4174741948)
DEBUG:  Redo record at (2, 4174741948); Undo record at (0, 0); Shutdown FALSE
DEBUG:  NextTransactionId: 79276624; NextOid: 4412280
DEBUG:  database system was not properly shut down; automatic recovery in progress...
DEBUG:  redo starts at (2, 4174742012)
DEBUG:  ReadRecord: record with zero len at (2, 4174768976)
DEBUG:  redo done at (2, 4174768916)
FATAL 2:  XLogFlush: request is not satisfied
/usr/bin/postmaster: Startup proc 30708 exited with status 512 - abort
DEBUG:  database system shutdown was interrupted at 2010-02-02 08:11:50 PST
DEBUG:  CheckPoint record at (2, 4174741948)
DEBUG:  Redo record at (2, 4174741948); Undo record at (0, 0); Shutdown FALSE
DEBUG:  NextTransactionId: 79276624; NextOid: 4412280
DEBUG:  database system was not properly shut down; automatic recovery in progress...
DEBUG:  redo starts at (2, 4174742012)
DEBUG:  ReadRecord: record with zero len at (2, 4174768976)
DEBUG:  redo done at (2, 4174768916)
FATAL 2:  XLogFlush: request is not satisfied
/usr/bin/postmaster: Startup proc 30722 exited with status 512 - abort
DEBUG:  database system shutdown was interrupted at 2010-02-02 08:18:59 PST
DEBUG:  CheckPoint record at (2, 4174741948)
DEBUG:  Redo record at (2, 4174741948); Undo record at (0, 0); Shutdown FALSE
DEBUG:  NextTransactionId: 79276624; NextOid: 4412280
DEBUG:  database system was not properly shut down; automatic recovery in progress...
DEBUG:  redo starts at (2, 4174742012)
DEBUG:  ReadRecord: record with zero len at (2, 4174768976)
DEBUG:  redo done at (2, 4174768916)
FATAL 2:  XLogFlush: request is not satisfied
/usr/bin/postmaster: Startup proc 30788 exited with status 512 - abort
******************************************************************************************
Thanks in advance.
Mary

------------------------------------------------
Mary Y Wang



Re: Startup proc 30595 exited with status 512 - abort and FATAL 2: XLogFlush

От
Tom Lane
Дата:
"Wang, Mary Y" <mary.y.wang@boeing.com> writes:
> I'm having a bad day. My Postgresql has this error "FATAL 2:  XLogFlush: request is not satisfied".   I tried to
followthe instructions from a thread about looking for a core dump, but when I tried to start the postmaster, I got
"/usr/bin/postmaster:Startup proc 30595 exited with status 512 - abort". 

You've got a corrupted page that is affected by a WAL replay operation,
so things are pretty much a mess.

> I'm pg version is postgresql-7.1.3-2.  What are my options?

[ blanches... ]  You do realize that that version has been obsolete
since 2002?

pg_resetxlog was a contrib module in 7.1, so if you can find the
software repository you got postgresql from, you should be able to
install postgresql-contrib.  However, I am betting this thing is so
old that you don't even have the chance at doing that.

Most likely you're going to have to go back to your last backup.
After which, you should make it a priority to get onto a less
antique version of Postgres (and the underlying OS too, no doubt).

            regards, tom lane

Re: Startup proc 30595 exited with status 512 - abort and FATAL 2: XLogFlush

От
"Wang, Mary Y"
Дата:
Tom,

Thanks for the help
I was able to find pg_resetxlog in the path.  Are there any precautions that I need to be aware of?  Or I just don't
haveany choice? 
I'd like to do pg_dump or pg_dumpall.

Should I do a pg_resetxlog $PGDATA?

Here is my pgcontrol info:
------------------------------------------------------------------
_control version number:            71
Catalog version number:               200101061
Database state:                       SHUTDOWNING
pg_control last modified:             Tue Feb  2 08:19:14 2010
Current log file id:                  2
Next log file segment:                249
Latest checkpoint location:           2/F8D581BC
Prior checkpoint location:            2/F8D49A34
Latest checkpoint's REDO location:    2/F8D581BC
Latest checkpoint's UNDO location:    0/0
Latest checkpoint's StartUpID:        210
Latest checkpoint's NextXID:          79276624
Latest checkpoint's NextOID:          4412280
Time of latest checkpoint:            Tue Feb  2 06:50:48 2010
Database block size:                  8192
Blocks per segment of large relation: 131072
LC_COLLATE:                           en_US
LC_CTYPE:                             en_US
---------------------------------------------------
In my pg_xlog directory, I've:
total 32808
-rw-------    1 postgres postgres 16777216 Feb  2 06:54 00000002000000F8
-rw-------    1 postgres postgres 16777216 Feb  1 10:51 00000002000000F9

------------------------------------------------
Thanks for any help.
Mary Y Wang


-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Tuesday, February 02, 2010 11:53 AM
To: Wang, Mary Y
Cc: pgsql-admin@postgresql.org
Subject: Re: [ADMIN] Startup proc 30595 exited with status 512 - abort and FATAL 2: XLogFlush

"Wang, Mary Y" <mary.y.wang@boeing.com> writes:
> I'm having a bad day. My Postgresql has this error "FATAL 2:  XLogFlush: request is not satisfied".   I tried to
followthe instructions from a thread about looking for a core dump, but when I tried to start the postmaster, I got
"/usr/bin/postmaster:Startup proc 30595 exited with status 512 - abort". 

You've got a corrupted page that is affected by a WAL replay operation, so things are pretty much a mess.

> I'm pg version is postgresql-7.1.3-2.  What are my options?

[ blanches... ]  You do realize that that version has been obsolete since 2002?

pg_resetxlog was a contrib module in 7.1, so if you can find the software repository you got postgresql from, you
shouldbe able to install postgresql-contrib.  However, I am betting this thing is so old that you don't even have the
chanceat doing that. 

Most likely you're going to have to go back to your last backup.
After which, you should make it a priority to get onto a less antique version of Postgres (and the underlying OS too,
nodoubt). 

            regards, tom lane

Re: Startup proc 30595 exited with status 512 - abort and FATAL 2: XLogFlush

От
Tom Lane
Дата:
"Wang, Mary Y" <mary.y.wang@boeing.com> writes:
> Thanks for the help
> I was able to find pg_resetxlog in the path.  Are there any precautions that I need to be aware of?  Or I just don't
haveany choice? 

I'd suggest taking a tarball copy of the $PGDATA tree, so you can at
least get back to where you were if it doesn't work.

Actually ... is this RHEL 2.1 as I suspect?  If so, can you find the
last RHEL 2.1 update, which was postgresql-7.1.3-7.rhel2.1AS?
There was a fix in that that might well address your issue:

* Wed Feb 23 2005 Tom Lane <tgl@redhat.com> 7.1.3-7.rhel2.1AS
- Back-patch community 7.2 change in error recovery behavior of XLogFlush;
  this allows successful restart of the database even in the presence of
  dubious values in LSN fields of database pages.

Alternatively, if you have the ability to rebuild the version you've
got, you're welcome to try adding the patch, which is attached.

            regards, tom lane

diff -Naur postgresql-7.1.3.orig/src/backend/access/transam/xlog.c postgresql-7.1.3/src/backend/access/transam/xlog.c
--- postgresql-7.1.3.orig/src/backend/access/transam/xlog.c    2001-08-16 14:36:37.000000000 -0400
+++ postgresql-7.1.3/src/backend/access/transam/xlog.c    2005-02-23 12:23:30.963333861 -0500
@@ -1242,14 +1242,42 @@
             WriteRqst.Flush = record;
             XLogWrite(WriteRqst);
             S_UNLOCK(&(XLogCtl->logwrt_lck));
-            if (XLByteLT(LogwrtResult.Flush, record))
-                elog(STOP, "XLogFlush: request is not satisfied");
             break;
         }
         S_LOCK_SLEEP(&(XLogCtl->logwrt_lck), spins++, XLOG_LOCK_TIMEOUT);
     }

     END_CRIT_SECTION();
+
+    /*
+     * If we still haven't flushed to the request point then we have a
+     * problem; most likely, the requested flush point is past end of
+     * XLOG. This has been seen to occur when a disk page has a corrupted
+     * LSN.
+     *
+     * Formerly we treated this as a PANIC condition, but that hurts the
+     * system's robustness rather than helping it: we do not want to take
+     * down the whole system due to corruption on one data page.  In
+     * particular, if the bad page is encountered again during recovery
+     * then we would be unable to restart the database at all!    (This
+     * scenario has actually happened in the field several times with 7.1
+     * releases. Note that we cannot get here while InRedo is true, but if
+     * the bad page is brought in and marked dirty during recovery then
+     * CreateCheckPoint will try to flush it at the end of recovery.)
+     *
+     * The current approach is to ERROR under normal conditions, but only
+     * NOTICE during recovery, so that the system can be brought up even
+     * if there's a corrupt LSN.  Note that for calls from xact.c, the
+     * ERROR will be promoted to PANIC since xact.c calls this routine
+     * inside a critical section.  However, calls from bufmgr.c are not
+     * within critical sections and so we will not force a restart for a
+     * bad LSN on a data page.
+     */
+    if (XLByteLT(LogwrtResult.Flush, record))
+        elog(InRecovery ? NOTICE : ERROR,
+             "xlog flush request %X/%X is not satisfied --- flushed only to %X/%X",
+             record.xlogid, record.xrecoff,
+             LogwrtResult.Flush.xlogid, LogwrtResult.Flush.xrecoff);
 }

 /*

Re: Startup proc 30595 exited with status 512 - abort and FATAL 2: XLogFlush

От
"Wang, Mary Y"
Дата:
Thanks Tom.

After talking to my co-worker, we decided to go to the last backup (we used the pg_dumpall -c command).
However, when I did enter "psql -f /usr/pgsql/backups/31.bak template1" to restore the database, I got "
psql: connectDBStart() -- connect() failed: No such file or directory
        Is the postmaster running locally
        and accepting connections on Unix socket '/tmp/.s.PGSQL.5432'?
".
I can't start my postmaster.  So how would I restore my last good backup?

Thanks in advance
Mary



-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Tuesday, February 02, 2010 12:14 PM
To: Wang, Mary Y
Cc: pgsql-admin@postgresql.org
Subject: Re: [ADMIN] Startup proc 30595 exited with status 512 - abort and FATAL 2: XLogFlush

"Wang, Mary Y" <mary.y.wang@boeing.com> writes:
> Thanks for the help
> I was able to find pg_resetxlog in the path.  Are there any precautions that I need to be aware of?  Or I just don't
haveany choice? 

I'd suggest taking a tarball copy of the $PGDATA tree, so you can at least get back to where you were if it doesn't
work.

Actually ... is this RHEL 2.1 as I suspect?  If so, can you find the last RHEL 2.1 update, which was
postgresql-7.1.3-7.rhel2.1AS?
There was a fix in that that might well address your issue:

* Wed Feb 23 2005 Tom Lane <tgl@redhat.com> 7.1.3-7.rhel2.1AS
- Back-patch community 7.2 change in error recovery behavior of XLogFlush;
  this allows successful restart of the database even in the presence of
  dubious values in LSN fields of database pages.

Alternatively, if you have the ability to rebuild the version you've got, you're welcome to try adding the patch, which
isattached. 

            regards, tom lane


Re: Startup proc 30595 exited with status 512 - abort and FATAL 2: XLogFlush

От
Scott Marlowe
Дата:
On Tue, Feb 2, 2010 at 3:29 PM, Wang, Mary Y <mary.y.wang@boeing.com> wrote:
> Thanks Tom.
>
> After talking to my co-worker, we decided to go to the last backup (we used the pg_dumpall -c command).
> However, when I did enter "psql -f /usr/pgsql/backups/31.bak template1" to restore the database, I got "
> psql: connectDBStart() -- connect() failed: No such file or directory
>        Is the postmaster running locally
>        and accepting connections on Unix socket '/tmp/.s.PGSQL.5432'?
> ".
> I can't start my postmaster.  So how would I restore my last good backup?

The normal way is to drop the old cluster and create a new one.  I'm
not entirely sure how to do that on something as old as RHEL 2.1.  The
normal way would be to mv or rm -rf the /var/lib/pgsql/data dir and
run initdb again.  something like:

sudo /etc/init.d/pgsql stop
sudo rm -rf /var/lib/pgsql/data/*
sudo -u postgres initdb -D /var/lib/pgsql/data
sudo /etc/init.d/pgsql start

or something like that.

Re: Startup proc 30595 exited with status 512 - abort and FATAL 2: XLogFlush

От
"Wang, Mary Y"
Дата:
Thanks Scott.
We used a command like this '/usr/bin/pg_dumpall -cC' postgres to do the pg_dumpall.

I started my postmaster with this command "/usr/bin/postmaster -D /var/lib/pgsql/data -i"
I got my tables back in the database, but I don't see any data.

What could have went wrong when I did the  "psql -f /usr/pgsql/backups/31.bak template1"?
Did I miss a step or something?

Now, I'm really worried.
Maybe I should have capture the log file when I did the restore?

Please advise.
Mary


------------------------------------------------
Mary Y Wang


-----Original Message-----
From: Scott Marlowe [mailto:scott.marlowe@gmail.com]
Sent: Tuesday, February 02, 2010 3:17 PM
To: Wang, Mary Y
Cc: Tom Lane; pgsql-admin@postgresql.org
Subject: Re: [ADMIN] Startup proc 30595 exited with status 512 - abort and FATAL 2: XLogFlush

On Tue, Feb 2, 2010 at 3:29 PM, Wang, Mary Y <mary.y.wang@boeing.com> wrote:
> Thanks Tom.
>
> After talking to my co-worker, we decided to go to the last backup (we used the pg_dumpall -c command).
> However, when I did enter "psql -f /usr/pgsql/backups/31.bak template1" to restore the database, I got "
> psql: connectDBStart() -- connect() failed: No such file or directory
>        Is the postmaster running locally
>        and accepting connections on Unix socket '/tmp/.s.PGSQL.5432'?
> ".
> I can't start my postmaster.  So how would I restore my last good backup?

The normal way is to drop the old cluster and create a new one.  I'm not entirely sure how to do that on something as
oldas RHEL 2.1.  The normal way would be to mv or rm -rf the /var/lib/pgsql/data dir and run initdb again.  something
like:

sudo /etc/init.d/pgsql stop
sudo rm -rf /var/lib/pgsql/data/*
sudo -u postgres initdb -D /var/lib/pgsql/data sudo /etc/init.d/pgsql start

or something like that.

Re: Startup proc 30595 exited with status 512 - abort and FATAL 2: XLogFlush

От
"Wang, Mary Y"
Дата:
Ok.  A little relief .  It looks like only one table has no data.  I guess I need to know what cause that table has no
data....Goingback to the drawing board. 

Mary



-----Original Message-----
From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Wang, Mary Y
Sent: Tuesday, February 02, 2010 4:32 PM
To: Scott Marlowe
Cc: Tom Lane; pgsql-admin@postgresql.org
Subject: Re: [ADMIN] Startup proc 30595 exited with status 512 - abort and FATAL 2: XLogFlush

Thanks Scott.
We used a command like this '/usr/bin/pg_dumpall -cC' postgres to do the pg_dumpall.

I started my postmaster with this command "/usr/bin/postmaster -D /var/lib/pgsql/data -i"
I got my tables back in the database, but I don't see any data.

What could have went wrong when I did the  "psql -f /usr/pgsql/backups/31.bak template1"?
Did I miss a step or something?

Now, I'm really worried.
Maybe I should have capture the log file when I did the restore?

Please advise.
Mary


------------------------------------------------
Mary Y Wang


-----Original Message-----
From: Scott Marlowe [mailto:scott.marlowe@gmail.com]
Sent: Tuesday, February 02, 2010 3:17 PM
To: Wang, Mary Y
Cc: Tom Lane; pgsql-admin@postgresql.org
Subject: Re: [ADMIN] Startup proc 30595 exited with status 512 - abort and FATAL 2: XLogFlush

On Tue, Feb 2, 2010 at 3:29 PM, Wang, Mary Y <mary.y.wang@boeing.com> wrote:
> Thanks Tom.
>
> After talking to my co-worker, we decided to go to the last backup (we used the pg_dumpall -c command).
> However, when I did enter "psql -f /usr/pgsql/backups/31.bak template1" to restore the database, I got "
> psql: connectDBStart() -- connect() failed: No such file or directory
>        Is the postmaster running locally
>        and accepting connections on Unix socket '/tmp/.s.PGSQL.5432'?
> ".
> I can't start my postmaster.  So how would I restore my last good backup?

The normal way is to drop the old cluster and create a new one.  I'm not entirely sure how to do that on something as
oldas RHEL 2.1.  The normal way would be to mv or rm -rf the /var/lib/pgsql/data dir and run initdb again.  something
like:

sudo /etc/init.d/pgsql stop
sudo rm -rf /var/lib/pgsql/data/*
sudo -u postgres initdb -D /var/lib/pgsql/data sudo /etc/init.d/pgsql start

or something like that.

--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org) To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin