Обсуждение: cache lookup failed for index
Hi,
I have a londiste consumer database that has some additional user data in it.That means that 4-5 day per week, everything works fine. The data is there too, i haven't yet been able to look if anything's missing when it goes wrong.
This is the error:---------------------
pg_dump: [archiver (db)] query failed: ERROR: cache lookup failed for index 231808363
pg_dump: [archiver (db)] query was: SELECT t.tableoid, t.oid, t.relname AS indexname, pg_catalog.pg_get_indexdef(i.indexrelid) AS indexdef, t.relnatts AS indnkeys, i.indkey, i.indisclustered, false AS indisreplident, t.relpages, c.contype, c.conname, c.condeferrable, c.condeferred, c.tableoid AS contableoid, c.oid AS conoid, pg_catalog.pg_get_constraintdef(c.oid, false) AS condef, (SELECT spcname FROM pg_catalog.pg_tablespace s WHERE s.oid = t.reltablespace) AS tablespace, t.reloptions AS indreloptions FROM pg_catalog.pg_index i JOIN pg_catalog.pg_class t ON (t.oid = i.indexrelid) LEFT JOIN pg_catalog.pg_constraint c ON (i.indrelid = c.conrelid AND i.indexrelid = c.conindid AND c.contype IN ('p','u','x')) WHERE i.indrelid = '231800968'::pg_catalog.oid AND i.indisvalid AND i.indisready ORDER BY indexname
---------------------
The index number is a different one every time.
The log says the same thing.
interesting detail:
On the days that i have successful dumps from, the last dump is finished within 1:15 hours.
But the errors occur at very different times, sometimes 14 hours after the job started (or maybe it started on a different day).
But munin doesn't show clear peak loads on the server.
And this is the script:
---------------------
#!/bin/bash
HOST=localhost
PGPORT=5432
BACKUPDIR=/data/dump/afolder
DATABASE=adatabase
SCHEMAS=`psql -p $PGPORT $DATABASE -t --command "select schema_name from information_schema.schemata where schema_name LIKE 'oz_%'"`
for SCHEMA in $SCHEMAS
do
# mv $BACKUPDIR/$SCHEMA.backup.1 $BACKUPDIR/$SCHEMA.backup.2
mv $BACKUPDIR/$SCHEMA.backup $BACKUPDIR/$SCHEMA.backup.1
pg_dump -Fc -Z3 -p $PGPORT -n $SCHEMA $DATABASE > $BACKUPDIR/$SCHEMA.backup
done
---------------------
---------------------
#!/bin/bash
HOST=localhost
PGPORT=5432
BACKUPDIR=/data/dump/afolder
DATABASE=adatabase
SCHEMAS=`psql -p $PGPORT $DATABASE -t --command "select schema_name from information_schema.schemata where schema_name LIKE 'oz_%'"`
for SCHEMA in $SCHEMAS
do
# mv $BACKUPDIR/$SCHEMA.backup.1 $BACKUPDIR/$SCHEMA.backup.2
mv $BACKUPDIR/$SCHEMA.backup $BACKUPDIR/$SCHEMA.backup.1
pg_dump -Fc -Z3 -p $PGPORT -n $SCHEMA $DATABASE > $BACKUPDIR/$SCHEMA.backup
done
---------------------
It seems like an index gets deleted while pg_dump has it in some task list, and by the time pg_dump wants to dump it, it's gone.
But that should not be possible, because of transactions.
Does anyone know what's up?
--
Willy-Bas Loos
On Tue, Jun 28, 2016 at 7:14 PM, Willy-Bas Loos <willybas@gmail.com> wrote:
(...)Does anyone know what's up?--
oh btw this is postgres 9.3 on debian 7
and londiste 2
--
Willy-Bas Loos
Willy-Bas Loos <willybas@gmail.com> writes: > [ pg_dump sometimes fails with ] > pg_dump: [archiver (db)] query failed: ERROR: cache lookup failed for > index 231808363 This wouldn't be too surprising if you're constantly creating and dropping indexes. There's a small window between where pg_dump starts its transaction and where it's able to acquire lock on each table; but since it's working from a transaction-start-time view of the catalogs, it would still expect the table to have all the indexes it did at the start. If you've got a lot of DDL going on, maybe the window wouldn't even be that small: pg_dump's attempt to lock some previous table might've blocked for awhile due to DDL on that one. regards, tom lane
Ah OK. So it needs a lock for the visibility to be registered to the transaction, is that it?
On Wed, Jun 29, 2016 at 11:16 AM, Willy-Bas Loos <willybas@gmail.com> wrote:
Ah OK. So it needs a lock for the visibility to be registered to the transaction, is that it?
Wait that doesn't make sense. pg_dump merely describes the table in DDL, so it doesn't even need access to the index pages. It only needs to read the pg_catalog afaik.
However, the indexes are created as a last step in the pg_dump process (at least in the resulting dump file). So it first makes the DDL to create the table, then dumps the data, and in the end it describes the constraints, indexes and privileges.
So maybe pg_dump first creates the DDL, then acquires a access share lock on the table for dumping the data, and then when it's time to create DDL for the index, it is gone from the pg_catalog.
So what i don't get is, -if the above is the case- If pg_dump expects to find an index, it already knows about its existence. Then why does it need to look for it again?
--
Willy-Bas Loos
Willy-Bas Loos <willybas@gmail.com> writes: > So what i don't get is, -if the above is the case- If pg_dump expects to > find an index, it already knows about its existence. Then why does it need > to look for it again? Because what it does is: BEGIN ISOLATION LEVEL REPEATABLE READ; -- run in a single transaction SELECT ... FROM pg_class; -- find out what all the tables are LOCK TABLE foo IN ACCESS SHARE MODE; -- repeat for each table to be dumped after which it runs around and collects subsidiary data such as what indexes exist for each table. But the transaction's view of the catalogs was frozen at the start of the first SELECT. So it can see entries for an index in pg_class and pg_index even if that index got dropped between transaction start and where pg_dump was able to lock the index's table. pg_dump can't tell the index is no longer there --- but some of the backend functions it calls can tell, and they throw errors. There are various ways this might be rejiggered, but none of them entirely remove all risk of failure in the presence of concurrent DDL. Personally I'd recommend just retrying the pg_dump until it succeeds. regards, tom lane
On Wed, Jun 29, 2016 at 4:26 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Willy-Bas Loos <willybas@gmail.com> writes:
> So what i don't get is, -if the above is the case- If pg_dump expects to
> find an index, it already knows about its existence. Then why does it need
> to look for it again?
pg_dump can't tell the index is no longer there --- but some of the
backend functions it calls can tell, and they throw errors.
There are various ways this might be rejiggered, but none of them
entirely remove all risk of failure in the presence of concurrent DDL.
Personally I'd recommend just retrying the pg_dump until it succeeds.
regards, tom lane
Now that i know what it is, I can live with it.
Thanks for the insight!--
Willy-Bas Loos