Re: Adjust ndistinct for eqjoinsel

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Adjust ndistinct for eqjoinsel
Дата
Msg-id 3394045.1657900601@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Adjust ndistinct for eqjoinsel  (Zhenghua Lyu <zlyu@vmware.com>)
Список pgsql-hackers
Zhenghua Lyu <zlyu@vmware.com> writes:
>     I run TPC-DS benchmark for Postgres and find the join size estimation has several problems.
>     For example, Ndistinct is key to join selectivity's estimation, this value does not take restrictions
>     of the rel, I hit some cases in the function eqjoinsel, nd is much larger than vardata.rel->rows.

>     Accurate estimation need good math model that considering dependency of join var and vars in restriction.
>     But at least, indistinct should not be greater than the number of rows.

>     See the attached patch to adjust nd in eqjoinsel.

We're very unlikely to accept this with no test case and no explanation
of why it's not an overcorrection.  get_variable_numdistinct already
clamps its result to rel->tuples, and I think that by using rel->rows
instead you are probably double-counting the selectivity of the rel's
restriction clauses.

See the sad history of commit 7f3eba30c, which did something
pretty close to this and eventually got almost entirely reverted
(97930cf57, 0d3b231ee).  I'd be the first to agree that better
estimates here would be great, but it's not as simple as it looks.

            regards, tom lane



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Stephen Frost
Дата:
Сообщение: Re: Add function to return backup_label and tablespace_map
Следующее
От: Melanie Plageman
Дата:
Сообщение: Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)