Re: Bad estimate with partial index

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: Bad estimate with partial index
Дата
Msg-id 34f4218c-f267-a851-d744-75e242661848@enterprisedb.com
обсуждение исходный текст
Ответ на Re: Bad estimate with partial index  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Ответы Re: Bad estimate with partial index  (Tom Lane <tgl@sss.pgh.pa.us>)
RE: Bad estimate with partial index  (André Hänsel <andre@webkr.de>)
Список pgsql-hackers

On 4/20/22 09:58, Tomas Vondra wrote:
> On 4/19/22 23:08, Tom Lane wrote:
>> I wrote:
>>> it looks like the problem is that the extended stats haven't been used
>>> while forming the estimate of the number of index entries retrieved, so
>>> we overestimate the cost of using this index.
>>> That seems like a bug.  Tomas?
>>
>> I dug into this enough to locate the source of the problem.
>> btcostestimate includes the partial index clauses in what it
>> sends to clauselist_selectivity, but per the comments for
>> add_predicate_to_index_quals:
>>
>>  * Note that indexQuals contains RestrictInfo nodes while the indpred
>>  * does not, so the output list will be mixed.  This is OK for both
>>  * predicate_implied_by() and clauselist_selectivity(), but might be
>>  * problematic if the result were passed to other things.
>>
>> That comment was true when it was written, but it's been falsified
>> by the extended-stats patches, which have added a whole lot of logic
>> in and under clauselist_selectivity that ignores clauses that are not
>> RestrictInfos.
>>
>> While we could perhaps fix this by having add_predicate_to_index_quals
>> add RestrictInfos, I'm inclined to feel that the extended-stats code
>> is in the wrong.  The contract for clauselist_selectivity has always
>> been that it could optimize if given RestrictInfos rather than bare
>> clauses, not that it would fail to work entirely without them.
>> There are probably more places besides add_predicate_to_index_quals
>> that are relying on that.
>>
> 
> Yes, that seems like a fair assessment. I'll look into fixing this, not
> sure how invasive it will get, though.
> 

So, here's a WIP fix that improves the example shared by Andre, and does
not seem to break anything (or at least not any regression test).

The whole idea is that instead of bailing out for non-RestrictInfo case,
it calculates the necessary information for the clause from scratch.
This means relids and pseudoconstant flag, which are checked to decide
if the clause is compatible with extended stats.

But when inspecting how to calculate pseudoconstant, I realized that
maybe that's not really needed. Per distribute_qual_to_rels() we only
set it to 'true' when bms_is_empty(relids), and we already check that
relids is a singleton, so it can't be empty - which means pseudoconstant
can't be true either.


Andre, are you in position to test this fix with your application? Which
Postgres version are you using, actually?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: using an end-of-recovery record in all cases
Следующее
От: "David G. Johnston"
Дата:
Сообщение: Re: Odd off-by-one dirty buffers and checkpoint buffers written