Обсуждение: INSERT/UPDATE statements sometimes choosing not to use PK index?

Поиск
Список
Период
Сортировка

INSERT/UPDATE statements sometimes choosing not to use PK index?

От
Matt
Дата:
I have a relatively simple data load script, which upserts (UPDATE existing rows, INSERT new rows), which should be supported by the primary key index, the only index on this table: 

    UPDATE destination SET ... FROM staging WHERE staging.pk = destination.pk
    INSERT INTO destination SELECT ... FROM staging WHERE NOT EXISTS(SELECT * FROM destination WHERE pk  = staging.pk); 

In most cases, EXPLAIN and runtime tell me the index is utilized. However, sometime on back to back runs (5 minute intervals) the runtime explodes and EXPLAIN shows me that the PK index is not used, as both the staging table and destination table a sequentially scanned. The source data maintains the same pattern, and this load job is the only write activity in this particular database. 

A left anti-join in the above pseudo-query explains the same, and seems to make the problem occur more frequently. 

What could cause PostgreSQL (9.1) to stop using an index in a case like this, when it normally and expectedly uses it? 

Re: INSERT/UPDATE statements sometimes choosing not to use PK index?

От
Tom Lane
Дата:
Matt <bsg075@gmail.com> writes:
> In most cases, EXPLAIN and runtime tell me the index is utilized. However,
> sometime on back to back runs (5 minute intervals) the runtime explodes and
> EXPLAIN shows me that the PK index is not used, as both the staging table
> and destination table a sequentially scanned.

You haven't explained what you do with the "staging table", but maybe you
need to ANALYZE it after you've loaded fresh data into it.  The stats for
the main table are presumably somewhat stable, but the stats for the
staging table maybe not.  The apparent correlation to consecutive runs
lends some support to this theory, as that would reduce the time window
for auto-ANALYZE to fix it for you.

            regards, tom lane


Re: INSERT/UPDATE statements sometimes choosing not to use PK index?

От
Matt
Дата:
Since this is an upsert, the staging table simply accepts a copy of pipe-delimited data via a COPY FROM STDIN, and a subsequent SQL script performs the UPDATE/INSERT. The staging table is then truncated (or delete without where) for the next run: Truncate staging, COPY into staging, update then insert destination from staging, repeat on next 5 minute clock cycle. 

Since the staging table is fully loaded into the destination table, a full scan was not a concern, but the lack of index usage on the destination. For example, this is normally supported by the PK index, and performs well:

    INSERT INTO destination (…)
    SELECT (…) FROM staging
    WHERE NOT EXISTS (SELECT * FROM source WHERE source.id = destination.id)

The above runs in expected time when the explain plan shows an index scan on the destination primary key index, and a seq scan on the staging table. This will continue for many runs, until something causes the engine to stop using the destination PK index, and begin scanning both the destination and staging. What triggers this is the interesting question.The only system event I can (loosely) correlate with the problem is the start of a nightly pg_dump, but as a read-only process, this would not be changing any index stats, correct?

An ANALYZE on staging is possible after each load cycle, but time does not permit on the destination table. I have been focusing on the destination because it has the index that is not being used. Will stats on the staging table affect index selection on the destination in a case like this?

In the process of attempting to diagnose this, both tables involved have been vacuumed (full), analyzed. I have also moved staging to an SSD volume, and created an equivalent index on staging - which is not used in any plan, nor do I expect to as there is no filter criteria on staging, and the index maintenance on staging would seem to be an unneeded overhead. But in this case, is there any advantage to an index on staging?

For reference, it is possible (not enough test cycles to verify) that left anti-join makes this situation worse, even though the explain plans appear identical:

    INSERT INTO destination (…)
    SELECT (…) FROM staging
    LEFT JOIN destination ON destination.id = staging.id
    WHERE destination.id IS NULL


On 29 Oct 2013, at 9:45, Tom Lane wrote:

> Matt <bsg075@gmail.com> writes:
>> In most cases, EXPLAIN and runtime tell me the index is utilized. However,
>> sometime on back to back runs (5 minute intervals) the runtime explodes and
>> EXPLAIN shows me that the PK index is not used, as both the staging table
>> and destination table a sequentially scanned.
>
> You haven't explained what you do with the "staging table", but maybe you
> need to ANALYZE it after you've loaded fresh data into it.  The stats for
> the main table are presumably somewhat stable, but the stats for the
> staging table maybe not.  The apparent correlation to consecutive runs
> lends some support to this theory, as that would reduce the time window
> for auto-ANALYZE to fix it for you.
>
> regards, tom lane