Обсуждение: ERROR: cache lookup failed for type 0
Hi, Am not sure if this is something we've done wrong or maybe a bug. Whenever any kind of query is done on the table below, this is the result: ispdb_vxe=> select * from pm.carrier_on_13642; ERROR: cache lookup failed for type 0 I first noticed it when I noticed that the regular backups were failing - pg_dump gives the same output and stops. I have tried 1. postgresql restart 2. vacuum full on the table 3. reindex of the table I am using Postgres 8.2.3 running on Centos Linux 4.4. Is there anything I can do to recover this table? Is there any danger in dropping the table if it cannot be recreated? Should this be reported as a bug? (although I have no clue as to how it got into this state). Regards, Mike
"Michael Harris \(BR/EPA\)" <michael.harris@ericsson.com> writes: > Whenever any kind of query is done on the table below, this is the > result: > ispdb_vxe=> select * from pm.carrier_on_13642; > ERROR: cache lookup failed for type 0 Hmm, seems like something happened to your system catalogs, but where exactly? Try this: select * from pg_attribute where attrelid = 'pm.carrier_on_13642'::regclass; and look whether any of the atttypid fields are zero. If so, does anything else look wrong? If not, the problem must be further afield --- have you got any triggers, check constraints, etc on this table? (Can you do "\d" on it?) regards, tom lane
Hi Tom, Thanks for your response. None of the typeid atttypid fields were 0: attrelid | attname | atttypid | attstattarget | attlen | attnum | attndims | attcacheoff | atttypmod | attbyval | attstorage | attalign | attnotnull | atthasdef | attisdropped | attislocal | attinhcount ----------+---------------------------+----------+---------------+------ --+--------+----------+-------------+-----------+----------+------------ +----------+------------+-----------+--------------+------------+------- ------ 167581 | tableoid | 26 | 0 | 4 | -7 | 0 | -1 | -1 | t | p | i | t | f | f | t | 0 167581 | cmax | 29 | 0 | 4 | -6 | 0 | -1 | -1 | t | p | i | t | f | f | t | 0 167581 | xmax | 28 | 0 | 4 | -5 | 0 | -1 | -1 | t | p | i | t | f | f | t | 0 167581 | cmin | 29 | 0 | 4 | -4 | 0 | -1 | -1 | t | p | i | t | f | f | t | 0 167581 | xmin | 28 | 0 | 4 | -3 | 0 | -1 | -1 | t | p | i | t | f | f | t | 0 167581 | ctid | 27 | 0 | 6 | -1 | 0 | -1 | -1 | f | p | s | t | f | f | t | 0 167581 | ropid | 23 | -1 | 4 | 1 | 0 | -1 | -1 | t | p | i | t | f | f | f | 1 167581 | moid | 23 | -1 | 4 | 2 | 0 | -1 | -1 | t | p | i | t | f | f | f | 1 167581 | tblock | 21 | -1 | 2 | 3 | 0 | -1 | -1 | t | p | s | t | f | f | f | 1 167581 | pmtransmittedcarrierpower | 1016 | -1 | -1 | 4 | 0 | -1 | -1 | f | x | d | f | f | f | f | 1 167581 | pmaveragerssi | 1016 | -1 | -1 | 5 | 0 | -1 | -1 | f | x | d | f | f | f | f | 1 167581 | pm_count | 20 | -1 | 8 | 6 | 0 | -1 | -1 | f | p | d | f | f | f | f | 1 (12 rows) I don't know enough about this catalog table to say if anything else is unusual about it. Comparing it with other tables that do not have a problem did not reveal anything startling (to my eyes anyway). The table is part of an inheritance structure. We are using inheritance to partition the data based on a number of criteria. The top level table ('carrier') has no records of it's own but defines the structure of information stored in the table set. It's descendants (eg. 'carrier_on') are for either the original raw data, or data summarised in one way or another. The third level tables (eg 'carrier_on_13642') are divided into blocks of time. All of the tables in this structure (and also parallel table structures that we have) are linked back to two central tables via foreign keys on the fields ropid and moid. There are a lot of relations which have these fields as foreign references: this is causing another (unrelated I think) problem: when we try to delete records from the table containing the field ropid, it bombs with 'out of memory'. => \d pm.carrier_on_13642; Table "pm.carrier_on_13642" Column | Type | Modifiers ---------------------------+----------+----------- ropid | integer | not null moid | integer | not null tblock | smallint | not null pmtransmittedcarrierpower | bigint[] | pmaveragerssi | bigint[] | pm_count | bigint | Indexes: "carrier_on_13642_pkey" PRIMARY KEY, btree (ropid, moid) "carrier_on_13642_moid_idx" btree (moid) "carrier_on_13642_ropid_idx" btree (ropid) Check constraints: "carrier_on_13642_tblock_check" CHECK (tblock = 13642::smallint) Foreign-key constraints: "carrier_on_13642_moid_fkey" FOREIGN KEY (moid) REFERENCES pm.mo(moid) ON DELETE CASCADE "carrier_on_13642_ropid_fkey" FOREIGN KEY (ropid) REFERENCES pm.rop(ropid) ON DELETE CASCADE Inherits: carrier_on Last night I did a pg_dump excluding this particular table. It failed, complaining about another table pm.carrier_oo_13642 with the same error. I then excluded that table also, after which the dump succeeded. What does "ERROR: cache lookup failed for type 0" mean? I searched all over the place for a good descripion but could not find one. Regards // Mike -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Friday, 25 May 2007 11:52 PM To: Michael Harris (BR/EPA) Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] ERROR: cache lookup failed for type 0 "Michael Harris \(BR/EPA\)" <michael.harris@ericsson.com> writes: > Whenever any kind of query is done on the table below, this is the > result: > ispdb_vxe=> select * from pm.carrier_on_13642; > ERROR: cache lookup failed for type 0 Hmm, seems like something happened to your system catalogs, but where exactly? Try this: select * from pg_attribute where attrelid = 'pm.carrier_on_13642'::regclass; and look whether any of the atttypid fields are zero. If so, does anything else look wrong? If not, the problem must be further afield --- have you got any triggers, check constraints, etc on this table? (Can you do "\d" on it?) regards, tom lane
"Michael Harris \(BR/EPA\)" <michael.harris@ericsson.com> writes: > None of the typeid atttypid fields were 0: Hmm, but you've got a couple of bigint-array columns: > 167581 | pmtransmittedcarrierpower | 1016 | -1 | > -1 | 4 | 0 | -1 | -1 | f | x > | d | f | f | f | f | > 1 > 167581 | pmaveragerssi | 1016 | -1 | > -1 | 5 | 0 | -1 | -1 | f | x > | d | f | f | f | f | > 1 So that leads to a theory I didn't think of before, which is that this is not catalog corruption but data corruption. Array values include the type OID of their elements (to allow a single array_out function to print the contents of any array value). If you had a row in which one of these fields had got zeroed out somehow, the error would be explainable. > What does "ERROR: cache lookup failed for type 0" mean? It means some bit of code tried to look up a pg_type entry for type OID 0, which can't possibly be a valid type OID. So wherever the bit of code got that OID from is corrupted. You could possibly learn a bit more by doing "\set VERBOSITY verbose" in psql before provoking the error; that would tell you exactly which bit of code is trying to look up the bad OID. Using gdb to get a stack trace from the errfinish call would be even more informative, if you have a debug-enabled build. But I suspect at this point that it's a corrupt-data problem and you should attack it on the basis of figuring out just which row contains the bad value, so you can fix or delete it. regards, tom lane
Hi Tom, Hmm .. Interesting. A few weeks back I had a corrupted table (due to a disk error). I temporarily set zero_damaged_pages and vacuumed the tables concerned. Now that I look back through the logs, the affected table was utrancell_oo_13642. The fact that the number on the end was the same means that the table contained data from the same day as the one that is now giving trouble (ie. This table was being populated on the same day as carrier_on_13642). Maybe the carrier table was also corrupted by the disk error (although I did not notice any problem at the time). Thanks for the info. I will try to locate the corrupted row(s), maybe pg_filedump can help with that. Regards // Mike -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Saturday, 26 May 2007 9:38 AM To: Michael Harris (BR/EPA) Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] ERROR: cache lookup failed for type 0 "Michael Harris \(BR/EPA\)" <michael.harris@ericsson.com> writes: > None of the typeid atttypid fields were 0: Hmm, but you've got a couple of bigint-array columns: > 167581 | pmtransmittedcarrierpower | 1016 | -1 | > -1 | 4 | 0 | -1 | -1 | f | x > | d | f | f | f | f | > 1 > 167581 | pmaveragerssi | 1016 | -1 | > -1 | 5 | 0 | -1 | -1 | f | x > | d | f | f | f | f | > 1 So that leads to a theory I didn't think of before, which is that this is not catalog corruption but data corruption. Array values include the type OID of their elements (to allow a single array_out function to print the contents of any array value). If you had a row in which one of these fields had got zeroed out somehow, the error would be explainable. > What does "ERROR: cache lookup failed for type 0" mean? It means some bit of code tried to look up a pg_type entry for type OID 0, which can't possibly be a valid type OID. So wherever the bit of code got that OID from is corrupted. You could possibly learn a bit more by doing "\set VERBOSITY verbose" in psql before provoking the error; that would tell you exactly which bit of code is trying to look up the bad OID. Using gdb to get a stack trace from the errfinish call would be even more informative, if you have a debug-enabled build. But I suspect at this point that it's a corrupt-data problem and you should attack it on the basis of figuring out just which row contains the bad value, so you can fix or delete it. regards, tom lane