Обсуждение: ERROR: cache lookup failed for type 0

Поиск

Список

Период

Сортировка

ERROR: cache lookup failed for type 0

От

"Michael Harris \(BR/EPA\)"

Дата:

25 мая 2007 г., 06:09:27

Hi,

Am not sure if this is something we've done wrong or maybe a bug.

Whenever any kind of query is done on the table below, this is the
result:

ispdb_vxe=> select * from pm.carrier_on_13642;
ERROR:  cache lookup failed for type 0

I first noticed it when I noticed that the regular backups were failing
- pg_dump gives the same output and stops.

I have tried
1. postgresql restart
2. vacuum full on the table
3. reindex of the table

I am using Postgres 8.2.3 running on Centos Linux 4.4.

Is there anything I can do to recover this table?

Is there any danger in dropping the table if it cannot be recreated?

Should this be reported as a bug? (although I have no clue as to how it
got into this state).

Regards,
Mike

Re: ERROR: cache lookup failed for type 0

От

Tom Lane

Дата:

25 мая 2007 г., 13:52:37

"Michael Harris \(BR/EPA\)" <michael.harris@ericsson.com> writes:
> Whenever any kind of query is done on the table below, this is the
> result:

> ispdb_vxe=> select * from pm.carrier_on_13642;
> ERROR:  cache lookup failed for type 0

Hmm, seems like something happened to your system catalogs, but where
exactly?  Try this:

select * from pg_attribute where attrelid = 'pm.carrier_on_13642'::regclass;

and look whether any of the atttypid fields are zero.  If so, does
anything else look wrong?  If not, the problem must be further afield
--- have you got any triggers, check constraints, etc on this table?
(Can you do "\d" on it?)

            regards, tom lane

Re: ERROR: cache lookup failed for type 0

От

"Michael Harris \(BR/EPA\)"

Дата:

25 мая 2007 г., 23:27:06

Hi Tom,

Thanks for your response.

None of the typeid atttypid fields were 0:

 attrelid |          attname          | atttypid | attstattarget |
attlen | attnum | attndims | attcacheoff | atttypmod | attbyval |
attstorage | attalign | attnotnull | atthasdef | attisdropped |
attislocal | attinhcount
----------+---------------------------+----------+---------------+------
--+--------+----------+-------------+-----------+----------+------------
+----------+------------+-----------+--------------+------------+-------
------
   167581 | tableoid                  |       26 |             0 |
4 |     -7 |        0 |          -1 |        -1 | t        | p
| i        | t          | f         | f            | t          |
0
   167581 | cmax                      |       29 |             0 |
4 |     -6 |        0 |          -1 |        -1 | t        | p
| i        | t          | f         | f            | t          |
0
   167581 | xmax                      |       28 |             0 |
4 |     -5 |        0 |          -1 |        -1 | t        | p
| i        | t          | f         | f            | t          |
0
   167581 | cmin                      |       29 |             0 |
4 |     -4 |        0 |          -1 |        -1 | t        | p
| i        | t          | f         | f            | t          |
0
   167581 | xmin                      |       28 |             0 |
4 |     -3 |        0 |          -1 |        -1 | t        | p
| i        | t          | f         | f            | t          |
0
   167581 | ctid                      |       27 |             0 |
6 |     -1 |        0 |          -1 |        -1 | f        | p
| s        | t          | f         | f            | t          |
0
   167581 | ropid                     |       23 |            -1 |
4 |      1 |        0 |          -1 |        -1 | t        | p
| i        | t          | f         | f            | f          |
1
   167581 | moid                      |       23 |            -1 |
4 |      2 |        0 |          -1 |        -1 | t        | p
| i        | t          | f         | f            | f          |
1
   167581 | tblock                    |       21 |            -1 |
2 |      3 |        0 |          -1 |        -1 | t        | p
| s        | t          | f         | f            | f          |
1
   167581 | pmtransmittedcarrierpower |     1016 |            -1 |
-1 |      4 |        0 |          -1 |        -1 | f        | x
| d        | f          | f         | f            | f          |
1
   167581 | pmaveragerssi             |     1016 |            -1 |
-1 |      5 |        0 |          -1 |        -1 | f        | x
| d        | f          | f         | f            | f          |
1
   167581 | pm_count                  |       20 |            -1 |
8 |      6 |        0 |          -1 |        -1 | f        | p
| d        | f          | f         | f            | f          |
1
(12 rows)

I don't know enough about this catalog table to say if anything else is
unusual about it. Comparing it with other tables that do not have a
problem did not reveal anything startling (to my eyes anyway).

The table is part of an inheritance structure. We are using inheritance
to partition the data based on a number of criteria.  The top level
table ('carrier') has no records of it's own but defines the structure
of information stored in the table set. It's descendants (eg.
'carrier_on') are for either the original raw data, or data summarised
in one way or another. The third level tables (eg 'carrier_on_13642')
are divided into blocks of time.

All of the tables in this structure (and also parallel table structures
that we have) are linked back to two central tables via foreign keys on
the fields ropid and moid. There are a lot of relations which have these
fields as foreign references: this is causing another (unrelated I
think) problem: when we try to delete records from the table containing
the field ropid, it bombs with 'out of memory'.

=> \d pm.carrier_on_13642;
           Table "pm.carrier_on_13642"
          Column           |   Type   | Modifiers
---------------------------+----------+-----------
 ropid                     | integer  | not null
 moid                      | integer  | not null
 tblock                    | smallint | not null
 pmtransmittedcarrierpower | bigint[] |
 pmaveragerssi             | bigint[] |
 pm_count                  | bigint   |
Indexes:
    "carrier_on_13642_pkey" PRIMARY KEY, btree (ropid, moid)
    "carrier_on_13642_moid_idx" btree (moid)
    "carrier_on_13642_ropid_idx" btree (ropid)
Check constraints:
    "carrier_on_13642_tblock_check" CHECK (tblock = 13642::smallint)
Foreign-key constraints:
    "carrier_on_13642_moid_fkey" FOREIGN KEY (moid) REFERENCES
pm.mo(moid) ON DELETE CASCADE
    "carrier_on_13642_ropid_fkey" FOREIGN KEY (ropid) REFERENCES
pm.rop(ropid) ON DELETE CASCADE
Inherits: carrier_on

Last night I did a pg_dump excluding this particular table. It failed,
complaining about another table pm.carrier_oo_13642 with the same error.
I then excluded that table also, after which the dump succeeded.

What does "ERROR: cache lookup failed for type 0" mean? I searched all
over the place for a good descripion but could not find one.

Regards // Mike

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Friday, 25 May 2007 11:52 PM
To: Michael Harris (BR/EPA)
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] ERROR: cache lookup failed for type 0

"Michael Harris \(BR/EPA\)" <michael.harris@ericsson.com> writes:
> Whenever any kind of query is done on the table below, this is the
> result:

> ispdb_vxe=> select * from pm.carrier_on_13642;
> ERROR:  cache lookup failed for type 0

Hmm, seems like something happened to your system catalogs, but where
exactly?  Try this:

select * from pg_attribute where attrelid =
'pm.carrier_on_13642'::regclass;

and look whether any of the atttypid fields are zero.  If so, does
anything else look wrong?  If not, the problem must be further afield
--- have you got any triggers, check constraints, etc on this table?
(Can you do "\d" on it?)

            regards, tom lane

Re: ERROR: cache lookup failed for type 0

От

Tom Lane

Дата:

25 мая 2007 г., 23:38:26

"Michael Harris \(BR/EPA\)" <michael.harris@ericsson.com> writes:
> None of the typeid atttypid fields were 0:

Hmm, but you've got a couple of bigint-array columns:

>    167581 | pmtransmittedcarrierpower |     1016 |            -1 |
> -1 |      4 |        0 |          -1 |        -1 | f        | x
> | d        | f          | f         | f            | f          |
> 1
>    167581 | pmaveragerssi             |     1016 |            -1 |
> -1 |      5 |        0 |          -1 |        -1 | f        | x
> | d        | f          | f         | f            | f          |
> 1

So that leads to a theory I didn't think of before, which is that this
is not catalog corruption but data corruption.  Array values include the
type OID of their elements (to allow a single array_out function to
print the contents of any array value).  If you had a row in which one
of these fields had got zeroed out somehow, the error would be
explainable.

> What does "ERROR: cache lookup failed for type 0" mean?

It means some bit of code tried to look up a pg_type entry for type OID 0,
which can't possibly be a valid type OID.  So wherever the bit of code
got that OID from is corrupted.

You could possibly learn a bit more by doing "\set VERBOSITY verbose"
in psql before provoking the error; that would tell you exactly which
bit of code is trying to look up the bad OID.  Using gdb to get a stack
trace from the errfinish call would be even more informative, if you
have a debug-enabled build.  But I suspect at this point that it's a
corrupt-data problem and you should attack it on the basis of figuring
out just which row contains the bad value, so you can fix or delete it.

            regards, tom lane

Re: ERROR: cache lookup failed for type 0

От

"Michael Harris \(BR/EPA\)"

Дата:

26 мая 2007 г., 01:09:35

Hi Tom,

Hmm .. Interesting. A few weeks back I had a corrupted table (due to a
disk error). I temporarily set zero_damaged_pages and vacuumed the
tables concerned. Now that I look back through the logs, the affected
table was utrancell_oo_13642. The fact that the number on the end was
the same means that the table contained data from the same day as the
one that is now giving trouble (ie. This table was being populated on
the same day as carrier_on_13642). Maybe the carrier table was also
corrupted by the disk error (although I did not notice any problem at
the time).

Thanks for the info. I will try to locate the corrupted row(s), maybe
pg_filedump can help with that.

Regards // Mike

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Saturday, 26 May 2007 9:38 AM
To: Michael Harris (BR/EPA)
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] ERROR: cache lookup failed for type 0

"Michael Harris \(BR/EPA\)" <michael.harris@ericsson.com> writes:
> None of the typeid atttypid fields were 0:

Hmm, but you've got a couple of bigint-array columns:

>    167581 | pmtransmittedcarrierpower |     1016 |            -1 |
> -1 |      4 |        0 |          -1 |        -1 | f        | x
> | d        | f          | f         | f            | f          |
> 1
>    167581 | pmaveragerssi             |     1016 |            -1 |
> -1 |      5 |        0 |          -1 |        -1 | f        | x
> | d        | f          | f         | f            | f          |
> 1

So that leads to a theory I didn't think of before, which is that this
is not catalog corruption but data corruption.  Array values include the
type OID of their elements (to allow a single array_out function to
print the contents of any array value).  If you had a row in which one
of these fields had got zeroed out somehow, the error would be
explainable.

> What does "ERROR: cache lookup failed for type 0" mean?

It means some bit of code tried to look up a pg_type entry for type OID
0, which can't possibly be a valid type OID.  So wherever the bit of
code got that OID from is corrupted.

You could possibly learn a bit more by doing "\set VERBOSITY verbose"
in psql before provoking the error; that would tell you exactly which
bit of code is trying to look up the bad OID.  Using gdb to get a stack
trace from the errfinish call would be even more informative, if you
have a debug-enabled build.  But I suspect at this point that it's a
corrupt-data problem and you should attack it on the basis of figuring
out just which row contains the bad value, so you can fix or delete it.

            regards, tom lane

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: ERROR: cache lookup failed for type 0

ERROR: cache lookup failed for type 0

Re: ERROR: cache lookup failed for type 0

Re: ERROR: cache lookup failed for type 0

Re: ERROR: cache lookup failed for type 0

Re: ERROR: cache lookup failed for type 0