Re: Postgres optimizer choosing wrong index

Поиск
Список
Период
Сортировка
От Jack Orenstein
Тема Re: Postgres optimizer choosing wrong index
Дата
Msg-id 4900ADD9.2090307@hds.com
обсуждение исходный текст
Ответ на Re: Postgres optimizer choosing wrong index  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Postgres optimizer choosing wrong index  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-general
Tom Lane wrote:
> Jack Orenstein <jack.orenstein@hds.com> writes:
>> If I run EXPLAIN on this query, (plugging in values 1 and 2 for the variables),
>> before VACUUM ANALYZE, I get the desired execution plan:
>
>>   Index Scan using idx_df on T  (cost=0.00..4.83 rows=1 width=454)
>>     Index Cond: ((dh = 1) AND (fh = 2))
>
>> But after VACUUM ANALYZE:
>
>>   Index Scan using idx_dn on T  (cost=0.00..5.27 rows=1 width=561)
>>     Index Cond: (dh = 1)
>>     Filter: (fh = 2)
>
>> Notice that postgres is now using the other index. This behavior is somewhat
>> dependent on the values plugged in. I ran a query to count dh values:
>
>>      select dir_hash, count(*) from external_file group by dir_hash;
>
>>        dh        | count
>>      ------------+--------
>>        916645488 |  20000
>>       1057692240 | 200000
>
> So you're plugging in a value that doesn't appear in the index, and
> Postgres knows it?
>
> I think that in this example, the estimated costs are going to be
> exactly the same either way, so it's kind of a tossup which index
> gets chosen --- and it's not actually going to matter any at runtime
> either.  Descending the btree to find that there's no matching entry
> is going to take just about the same amount of time in either index.
> If you plug in a value that *does* occur in the table it should probably
> choose the more-relevant index consistently.

Unfortunately, it matters a lot at runtime. The dh value is not very selective,
as shown by the statistics above. But the fh value is highly selective. The
idx_df index makes use of this selectivity; the idx_dn index does not. In fact,
the reason I noticed this problem was horrendous runtime performance, sometimes.
I started poking around and noticed the VACUUM ANALYZE dependence.

Given that the estimated costs are about the same, I'm wondering why postgres
doesn't go with the index that can do more work in the "Index Cond" part of the
plan.

Do you know how I can force use of the idx_df index?

Jack

В списке pgsql-general по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: Re: Annoying Reply-To
Следующее
От: Simon Riggs
Дата:
Сообщение: Re: [HACKERS] Hot Standby utility and administrator functions