Обсуждение: BUG #1800: "unexpected chunk number" during pg_dump

Поиск
Список
Период
Сортировка

BUG #1800: "unexpected chunk number" during pg_dump

От
"Aaron Harsh"
Дата:
The following bug has been logged online:

Bug reference:      1800
Logged by:          Aaron Harsh
Email address:      ajh@rentrak.com
PostgreSQL version: 7.4.6
Operating system:   RedHat ES release 3, x86_64
Description:        "unexpected chunk number" during pg_dump
Details:

Our regular pg_dump aborted this afternoon with this output:

pg_dump: ERROR:  unexpected chunk number 0 (expected 1) for toast value
4294879152
pg_dump: SQL command to dump the contents of table "dataset_cache" failed:
PQendcopy() failed.
pg_dump: Error message from server: ERROR:  unexpected chunk number 0
(expected 1) for toast value 4294879152
pg_dump: The command was: COPY public.dataset_cache (checksum, version_no,
bind_params, sql_statement, serialized_value, date_created) TO stdout;
pg_dump: *** aborted because of error

We saw the same message when we tried to cluster the effective table.  The
problem went away after truncating the table.

I've searched the pgsql-bugs archives and found reports of this problem, but
haven't seen a solution.  Is there a solution to keep this from happening in
the future? (Version upgrade maybe?)

Re: BUG #1800: "unexpected chunk number" during pg_dump

От
Alvaro Herrera
Дата:
On Mon, Aug 01, 2005 at 06:02:30AM +0100, Aaron Harsh wrote:

> pg_dump: ERROR:  unexpected chunk number 0 (expected 1) for toast value
> 4294879152
> pg_dump: SQL command to dump the contents of table "dataset_cache" failed:
> PQendcopy() failed.
> pg_dump: Error message from server: ERROR:  unexpected chunk number 0
> (expected 1) for toast value 4294879152
> pg_dump: The command was: COPY public.dataset_cache (checksum, version_no,
> bind_params, sql_statement, serialized_value, date_created) TO stdout;
> pg_dump: *** aborted because of error
>
> We saw the same message when we tried to cluster the effective table.  The
> problem went away after truncating the table.
>
> I've searched the pgsql-bugs archives and found reports of this problem, but
> haven't seen a solution.  Is there a solution to keep this from happening in
> the future? (Version upgrade maybe?)

Looks very much like the table was corrupted.  Maybe you should try to
test your RAM and disks.  Not sure how to do that on x86-64 though,
unless the test utility at www.memtest86.com has been ported to it.

--
Alvaro Herrera (<alvherre[a]alvh.no-ip.org>)
"La naturaleza, tan frágil, tan expuesta a la muerte... y tan viva"

Re: BUG #1800: "unexpected chunk number" during pg_dump

От
Oliver Jowett
Дата:
Alvaro Herrera wrote:

> Looks very much like the table was corrupted.  Maybe you should try to
> test your RAM and disks.  Not sure how to do that on x86-64 though,
> unless the test utility at www.memtest86.com has been ported to it.

x86-64 systems will still boot and run 32-bit code fine (although
obviously memtest86 isn't going to test memory it can't address in
32-bit mode)

-O

Re: BUG #1800: "unexpected chunk number" during pg_dump

От
"Aaron Harsh"
Дата:
> >>> Alvaro Herrera <alvherre@alvh.no-ip.org> 08/10/05 9:03 AM >>>
> On Mon, Aug 01, 2005 at 06:02:30AM +0100, Aaron Harsh wrote:
> > pg_dump: ERROR:  unexpected chunk number 0 (expected 1) for toast value
> > ...
> Looks very much like the table was corrupted.  Maybe you should try to
> test your RAM and disks.  Not sure how to do that on x86-64 though,
> unless the test utility at www.memtest86.com has been ported to it.

The server is running off of ECC RAM on a RAID-10 set, so a one-off disk/RA=
M failure seems unlikely.  The server had been running beautifully for 6 mo=
nths prior to this error, and hasn't been evidencing the problem since, so =
it seems unlikely that this is due to a bad DIMM or RAID controller.

The timing might be a coincidence, but this error happened within a day of =
our OID counter wrapping around back to 0.  (Although Tom Lane mentioned in=
 pgsql-general that he was inclined to consider the timing a coincidence).


--=20
Aaron Harsh
ajh@rentrak.com
503-284-7581 x347

Re: BUG #1800: "unexpected chunk number" during pg_dump

От
Alvaro Herrera
Дата:
On Wed, Aug 10, 2005 at 06:07:24PM -0700, Aaron Harsh wrote:
> > >>> Alvaro Herrera <alvherre@alvh.no-ip.org> 08/10/05 9:03 AM >>>
> > On Mon, Aug 01, 2005 at 06:02:30AM +0100, Aaron Harsh wrote:
> > > pg_dump: ERROR:  unexpected chunk number 0 (expected 1) for toast value
> > > ...
> > Looks very much like the table was corrupted.  Maybe you should try to
> > test your RAM and disks.  Not sure how to do that on x86-64 though,
> > unless the test utility at www.memtest86.com has been ported to it.
>
> The server is running off of ECC RAM on a RAID-10 set, so a one-off
> disk/RAM failure seems unlikely.  The server had been running
> beautifully for 6 months prior to this error, and hasn't been
> evidencing the problem since, so it seems unlikely that this is due to
> a bad DIMM or RAID controller.
>
> The timing might be a coincidence, but this error happened within a
> day of our OID counter wrapping around back to 0.  (Although Tom Lane
> mentioned in pgsql-general that he was inclined to consider the timing
> a coincidence).

Not sure what else to attribute the failure to then.  But I should point
out that Oid normally wraps to FirstNormalObjectId (known as
BootstrapObjectIdData on previous sources), which is 16384, not 0.

Anyway I was originally thinking the problem data was 4294879152
(0xFFFEA7B0), not the 0.  Have you tried to manually extract the data
from the dataset_cache table?  You could try figuring out what page
contains the bad data, and manually peek into it using pg_filedump.

--
Alvaro Herrera (<alvherre[a]alvh.no-ip.org>)
"Uno puede defenderse de los ataques; contra los elogios se esta indefenso"

Re: BUG #1800: "unexpected chunk number" during pg_dump

От
"Aaron Harsh"
Дата:
> Alvaro Herrera <alvherre@alvh.no-ip.org> 08/11/05 9:52 AM >>>
> Anyway I was originally thinking the problem data was 4294879152
> (0xFFFEA7B0), not the 0.  Have you tried to manually extract the data
> from the dataset_cache table?  You could try figuring out what page
> contains the bad data, and manually peek into it using pg_filedump.

Unfortunately, the table doesn't show any problems now (I truncated it afte=
r the pg_dump failed)and so there's not a lot of further detail I can give =
you.  I suppose this means that we'll have to wait until such time as the p=
roblem shows up again before we can continue.

Thanks for your help.

--=20
Aaron Harsh
ajh@rentrak.com
503-284-7581 x347