Re: corruption issue after server crash - ERROR: unexpected chunk number 0

Поиск
Список
Период
Сортировка
От Mike Broers
Тема Re: corruption issue after server crash - ERROR: unexpected chunk number 0
Дата
Msg-id CAB9893iYU=yPJV-R=Mrc4mKPkfX9PvwDJRhhRNz+e5tO=o8umw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: corruption issue after server crash - ERROR: unexpected chunk number 0  (Kevin Grittner <kgrittn@ymail.com>)
Ответы Re: corruption issue after server crash - ERROR: unexpected chunk number 0  (Kevin Grittner <kgrittn@ymail.com>)
Список pgsql-general
Thanks for the response.  fsync and full_page_writes are both on.  

Our database runs on a managed hosting provider's vmhost server/san, I can possibly request for them to provide some hardware test results - do you have any specifics diagnostics in mind?  The crash was apparently due to our vmhost suddenly losing power, the only row that it has complained with the chunk error also migrated into both standby servers, and as previously stated was fixed with a reindex of the parent table in one of the standby servers after taking it out of recovery.  The vacuumdb -avz on this test copy didnt have any errors or warnings, im going to also run a pg_dumpall on this host to see if any other rows are problematic. 

Is there something else I can run to confirm we are more or less ok at the database level after the pg_dumpall or is there no way to be sure and a fresh initdb is required. 

I am planning on running the reindex in actual production tonight during our maintenance window, but was hoping if that worked we would be out of the woods.  



On Thu, Nov 21, 2013 at 3:56 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
Mike Broers <mbroers@gmail.com> wrote:

> Hello we are running postgres 9.2.5 on RHEL6, our production
> server crashed hard and when it came back up our logs were
> flooded with:

> ERROR:  unexpected chunk number 0 (expected 1) for toast value 117927127 in pg_toast_19122

Your database is corrupted.  Unless you were running with fsync =
off or full_page_writes = off, that should not happen.  It is
likely to be caused by a hardware problem (bad RAM, a bad disk
drive, or network problems if your storage is across a network).

If it were me, I would stop the database service and copy the full
data directory tree.

http://wiki.postgresql.org/wiki/Corruption

If fsync or full_page_writes were off, your best bet is probably to
go to your backup.  If you don't go to a backup, you should try to
get to a point where you can run pg_dump, and dump and load to a
freshly initdb'd cluster.

If fsync and full_page_writes were both on, you should run hardware
diagnostics at your earliest opportunity.  When hardware starts to
fail, the first episode is rarely the last or the most severe.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

В списке pgsql-general по дате отправления:

Предыдущее
От: Joey Quinn
Дата:
Сообщение: Re: Primary Key
Следующее
От: Kevin Grittner
Дата:
Сообщение: Re: corruption issue after server crash - ERROR: unexpected chunk number 0