Обсуждение: Re: Large Database \d: ERROR: cache lookup failed for relation ...

Поиск
Список
Период
Сортировка

Re: Large Database \d: ERROR: cache lookup failed for relation ...

От
Erik Jones
Дата:
I originally sent this message from my gmail account yesterday as we
were having issues with our work mail servers yesterday, but seeing
that it hasn't made it to the lists yet, I'm resending from my
registered address.  You have my apologies if you receive this twice.

"Thomas F. O'Connell" <tf ( at ) o ( dot ) ptimized ( dot ) com> writes:
 > I'm dealing with a database where there are ~150,000 rows in

 > information_schema.tables. I just tried to do a \d, and it came back
 > with this:

 > ERROR:  cache lookup failed for relation [oid]

 > Is this indicative of corruption, or is it possibly a resource issue?

Greetings,

This message is a follow-up to Thomas's message quoted above (we're
working together on the same database). He received one response when
he sent the above message which was from Tom Lane and can be easily
summarized as him having said that that could happen tables were
being created or dropped while running the \d in psql. Unfortunately,
that wasn't the case, we have now determined that there is some
corruption in our database and we are hoping some of you back-end
gurus might have some suggestions.

How we verified that there is corruption was simply to reindex all of
our tables in addition to getting the same errors when running a dump
this past weekend.  We so far have a list of five tables for which
reindex fails with the error: "ERROR: could not open relation with
OID xxxx" (sub xxxx with the five different #s) and one that fails
reindexing with "ERROR: xxxxx is an index" where is an index on a
completely different table. After dropping all of the indexes on
these tables (a couple didn't have any to begin with), we still
cannot run reindex on them. In addition, we can't drop the tables
either (we get the same errors). We can however run alter table
statements on them. So, we have scheduled a downtime for an evening
later this week wherein we plan on bringing the database down for a
REINDEX SYSTEM and before that we are going to run a dump excluding
those tables, restore that on a separate machine and see if these
errors crop up there anywhere. Is there anything else anyone can
think of that we can do to narrow down where the actual corruption is
or how to fix it?

Erik Jones

Software Developer | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com



Re: Large Database \d: ERROR: cache lookup failed for relation ...

От
Jim Nasby
Дата:
I'm working with these guys to resolve the immediate issue, but I
suspect there's a race condition somewhere in the code.

What's happened is that OIDs have been changed in the system. There's
not a lot of table DDL that happens, but there is a substantial
amount of view DDL that can take place. In a nutshell, tables will
sometimes have fields added to them, and when that happens a whole
set of views needs to be re-created to take the new fields into account.

The files for corrupted tables do exist; this seems to be mostly a
catalog corruption issue. I'm seeing both what appear to be
inconsistencies between relcache and the catalog tables as well as
corruption between tables themselves:

emma2=# select * from userdata_8464_campaigns;
ERROR:  could not open relation with OID 138807643
emma2=# \d userdata_8464_campaigns
                                         Table
"public.userdata_8464_campaigns"
       Column             |            Type
|                            Modifiers
-------------------------------+-----------------------------
+------------------------------------------------------------------
campaign_id              | bigint                      | not null
default nextval(('emma_campaigns_seq'::text)::regclass)
account_id               | bigint                      | not null
cep_object_id                 | bigint                      | not
null default nextval(('cep_object_seq'::text)::regclass)
campaign_name            | character varying(255)      | not null
campaign_subject         | character varying(255)      | not null
layout_page_id           | bigint                      | not null
layout_content_id        | bigint                      | not null
campaign_create_date     | timestamp without time zone | not null
default now()
campaign_last_mod_date   | timestamp without time zone | not null
default now()
campaign_status          | character varying(50)       | not null
campaign_parent_id       | bigint                      |
published_campaign_id    | bigint                      |
campaign_plaintext       | text                        |
campaign_plaintext_ds    | timestamp without time zone |
delivery_old_score       | double precision            |
campaign_person_defaults | text                        |
Inherits: emma_campaigns

select oid from pg_class where relname='userdata_8464_campaigns';
   oid
--------
533438
(1 row)

And that file actually does exist on disk...

select * from pg_index where indexrelid=138807643;
indexrelid | indrelid | indnatts | indisunique | indisprimary |
indisclustered | indisvalid | indkey | indclass | indexprs | indpred
------------+----------+----------+-------------+--------------
+----------------+------------+--------+----------+----------+---------
   138807643 |   533438 |        1 | t           | t            |
f              | t          | 1      | 1980     |          |
(1 row)

select * from pg_class where oid=138807643;
relname | relnamespace | reltype | relowner | relam | relfilenode |
reltablespace | relpages | reltuples | reltoastrelid | reltoastidxid
| relhasindex | relisshared | relkind | relnatts | relchecks |
reltriggers | relukeys | relfkeys | relrefs | relhasoids | relhaspkey
| relhasrules | relhassubclass | relfrozenxid | relacl | reloptions
---------+--------------+---------+----------+-------+-------------
+---------------+----------+-----------+---------------
+---------------+-------------+-------------+---------+----------
+-----------+-------------+----------+----------+---------
+------------+------------+-------------+----------------
+--------------+--------+------------
(0 rows)

On Jun 5, 2007, at 11:27 AM, Erik Jones wrote:

> I originally sent this message from my gmail account yesterday as
> we were having issues with our work mail servers yesterday, but
> seeing that it hasn't made it to the lists yet, I'm resending from
> my registered address.  You have my apologies if you receive this
> twice.
>
> "Thomas F. O'Connell" <tf ( at ) o ( dot ) ptimized ( dot ) com>
> writes:
> > I'm dealing with a database where there are ~150,000 rows in
>
> > information_schema.tables. I just tried to do a \d, and it came back
> > with this:
>
> > ERROR:  cache lookup failed for relation [oid]
>
> > Is this indicative of corruption, or is it possibly a resource
> issue?
>
> Greetings,
>
> This message is a follow-up to Thomas's message quoted above (we're
> working together on the same database). He received one response
> when he sent the above message which was from Tom Lane and can be
> easily summarized as him having said that that could happen tables
> were being created or dropped while running the \d in psql.
> Unfortunately, that wasn't the case, we have now determined that
> there is some corruption in our database and we are hoping some of
> you back-end gurus might have some suggestions.
>
> How we verified that there is corruption was simply to reindex all
> of our tables in addition to getting the same errors when running a
> dump this past weekend.  We so far have a list of five tables for
> which reindex fails with the error: "ERROR: could not open relation
> with OID xxxx" (sub xxxx with the five different #s) and one that
> fails reindexing with "ERROR: xxxxx is an index" where is an index
> on a completely different table. After dropping all of the indexes
> on these tables (a couple didn't have any to begin with), we still
> cannot run reindex on them. In addition, we can't drop the tables
> either (we get the same errors). We can however run alter table
> statements on them. So, we have scheduled a downtime for an evening
> later this week wherein we plan on bringing the database down for a
> REINDEX SYSTEM and before that we are going to run a dump excluding
> those tables, restore that on a separate machine and see if these
> errors crop up there anywhere. Is there anything else anyone can
> think of that we can do to narrow down where the actual corruption
> is or how to fix it?
>
> Erik Jones
>
> Software Developer | Emma®
> erik@myemma.com
> 800.595.4401 or 615.292.5888
> 615.292.0777 (fax)
>
> Emma helps organizations everywhere communicate & market in style.
> Visit us online at http://www.myemma.com
>
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>       choose an index scan if your joining column's datatypes do not
>       match
>

--
Jim Nasby                                            jim@nasby.net
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)