Обсуждение: cache lookup failed for index

Поиск
Список
Период
Сортировка

cache lookup failed for index

От
Willy-Bas Loos
Дата:
Hi,

I have a londiste consumer database that has some additional user data in it.
The user data is in schema's with the prefix oz_
Every night we dump those schema's with pg_dump.

About 2-3 times per week cron emails me that something went wrong.
That means that 4-5 day per week, everything works fine. The data is there too, i haven't yet been able to look if anything's missing when it goes wrong.

This is the error:
---------------------
pg_dump: [archiver (db)] query failed: ERROR:  cache lookup failed for index 231808363
pg_dump: [archiver (db)] query was: SELECT t.tableoid, t.oid, t.relname AS indexname, pg_catalog.pg_get_indexdef(i.indexrelid) AS indexdef, t.relnatts AS indnkeys, i.indkey, i.indisclustered, false AS indisreplident, t.relpages, c.contype, c.conname, c.condeferrable, c.condeferred, c.tableoid AS contableoid, c.oid AS conoid, pg_catalog.pg_get_constraintdef(c.oid, false) AS condef, (SELECT spcname FROM pg_catalog.pg_tablespace s WHERE s.oid = t.reltablespace) AS tablespace, t.reloptions AS indreloptions FROM pg_catalog.pg_index i JOIN pg_catalog.pg_class t ON (t.oid = i.indexrelid) LEFT JOIN pg_catalog.pg_constraint c ON (i.indrelid = c.conrelid AND i.indexrelid = c.conindid AND c.contype IN ('p','u','x')) WHERE i.indrelid = '231800968'::pg_catalog.oid AND i.indisvalid AND i.indisready ORDER BY indexname

---------------------
The index number is a different one every time.
The log says the same thing.

interesting detail:
On  the days that i have successful dumps from, the last dump is finished within 1:15 hours.
But the errors occur at very different times, sometimes 14 hours after the job started (or maybe it started on a different day).
But munin doesn't show clear peak loads on the server.


And this is the script:
---------------------
#!/bin/bash

HOST=localhost
PGPORT=5432
BACKUPDIR=/data/dump/afolder
DATABASE=adatabase

SCHEMAS=`psql -p $PGPORT $DATABASE -t --command "select schema_name from information_schema.schemata where schema_name LIKE 'oz_%'"`

for SCHEMA in $SCHEMAS
do
#    mv $BACKUPDIR/$SCHEMA.backup.1 $BACKUPDIR/$SCHEMA.backup.2
    mv $BACKUPDIR/$SCHEMA.backup $BACKUPDIR/$SCHEMA.backup.1
    pg_dump -Fc -Z3 -p $PGPORT -n $SCHEMA $DATABASE > $BACKUPDIR/$SCHEMA.backup
done
---------------------

It seems like an index gets deleted while pg_dump has it in some task list, and by the time pg_dump wants to dump it, it's gone.
But that should not be possible, because of transactions.

Does anyone know what's up?
--
Willy-Bas Loos

Re: cache lookup failed for index

От
Willy-Bas Loos
Дата:

On Tue, Jun 28, 2016 at 7:14 PM, Willy-Bas Loos <willybas@gmail.com> wrote:

(...)
Does anyone know what's up?
--


oh btw this is postgres 9.3 on debian 7
and londiste 2

--
Willy-Bas Loos

Re: cache lookup failed for index

От
Tom Lane
Дата:
Willy-Bas Loos <willybas@gmail.com> writes:
> [ pg_dump sometimes fails with ]
> pg_dump: [archiver (db)] query failed: ERROR:  cache lookup failed for
> index 231808363

This wouldn't be too surprising if you're constantly creating and dropping
indexes.  There's a small window between where pg_dump starts its
transaction and where it's able to acquire lock on each table; but since
it's working from a transaction-start-time view of the catalogs, it would
still expect the table to have all the indexes it did at the start.

If you've got a lot of DDL going on, maybe the window wouldn't even be
that small: pg_dump's attempt to lock some previous table might've blocked
for awhile due to DDL on that one.

            regards, tom lane


Re: cache lookup failed for index

От
Willy-Bas Loos
Дата:

Ah OK. So it needs a lock for the visibility to be registered to the transaction, is that it?

Re: cache lookup failed for index

От
Willy-Bas Loos
Дата:
On Wed, Jun 29, 2016 at 11:16 AM, Willy-Bas Loos <willybas@gmail.com> wrote:

Ah OK. So it needs a lock for the visibility to be registered to the transaction, is that it?


Wait that doesn't make sense. pg_dump merely describes the table in DDL, so it doesn't even need access to the index pages. It only needs to read the pg_catalog afaik.
However, the indexes are created as a last step in the pg_dump process (at least in the resulting dump file). So it first makes the DDL to create the table, then dumps the data, and in the end it describes the constraints, indexes and privileges.
So maybe pg_dump first creates the DDL, then acquires a access share lock on the table for dumping the data, and then when it's time to create DDL for the index, it is gone from the pg_catalog.
So what i don't get is, -if the above is the case- If  pg_dump expects to find an index, it already knows about its existence. Then why does it need to look for it again?

--
Willy-Bas Loos

Re: cache lookup failed for index

От
Tom Lane
Дата:
Willy-Bas Loos <willybas@gmail.com> writes:
> So what i don't get is, -if the above is the case- If  pg_dump expects to
> find an index, it already knows about its existence. Then why does it need
> to look for it again?

Because what it does is:

BEGIN ISOLATION LEVEL REPEATABLE READ;  -- run in a single transaction
SELECT ... FROM pg_class;  -- find out what all the tables are
LOCK TABLE foo IN ACCESS SHARE MODE;  -- repeat for each table to be dumped

after which it runs around and collects subsidiary data such as what
indexes exist for each table.  But the transaction's view of the catalogs
was frozen at the start of the first SELECT.  So it can see entries for
an index in pg_class and pg_index even if that index got dropped between
transaction start and where pg_dump was able to lock the index's table.
pg_dump can't tell the index is no longer there --- but some of the
backend functions it calls can tell, and they throw errors.

There are various ways this might be rejiggered, but none of them
entirely remove all risk of failure in the presence of concurrent DDL.
Personally I'd recommend just retrying the pg_dump until it succeeds.

            regards, tom lane


Re: cache lookup failed for index

От
Willy-Bas Loos
Дата:

On Wed, Jun 29, 2016 at 4:26 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Willy-Bas Loos <willybas@gmail.com> writes:
> So what i don't get is, -if the above is the case- If  pg_dump expects to
> find an index, it already knows about its existence. Then why does it need
> to look for it again?


pg_dump can't tell the index is no longer there --- but some of the
backend functions it calls can tell, and they throw errors.

There are various ways this might be rejiggered, but none of them
entirely remove all risk of failure in the presence of concurrent DDL.
Personally I'd recommend just retrying the pg_dump until it succeeds.

                        regards, tom lane

Now that i know what it is, I can live with it.
Thanks for the insight!

--
Willy-Bas Loos