Re: n_distinct off by a factor of 1000

Поиск

Список

Период

Сортировка

От	Michael Lewis
Тема	Re: n_distinct off by a factor of 1000
Дата	25 июня 2020 г. 01:27:35
Msg-id	CAHOFxGpv06xOQqxXcfCrb-A5LgwOyBaeHuZvVP3CEvk8L=MsTQ@mail.gmail.com обсуждение исходный текст
Ответ на	Re: n_distinct off by a factor of 1000 ("Peter J. Holzer" <hjp-pgsql@hjp.at>)
Ответы	Re: n_distinct off by a factor of 1000 (Klaudie Willis <Klaudie.Willis@protonmail.com>) Re: n_distinct off by a factor of 1000 ("Peter J. Holzer" <hjp-pgsql@hjp.at>)
Список	pgsql-general

Дерево обсуждения

On Wed, Jun 24, 2020, 2:35 PM Peter J. Holzer <hjp-pgsql@hjp.at> wrote:

Yes, estimating the number of distinct values from a relatively small
sample is hard when you don't know the underlying distribution. It might
be possible to analyze the sample to find the distribution and get a
better estimate. But I'm not sure how useful that would really be: If
a few values are very common and most very rare you are probably also
much more likely to use the common values in a query: And for those you
you would massively underestimate their frequency if you had an accurate
n_distinct value. That might be just as bad or even worse.

This would only be true for values that are "common" but not in the MCVs list, right?

If we could increase the sampling ratio beyond the hard coded 300x to get a more representative sample and use that to estimate ndistinct (and also the frequency of the most common values) but only actually stored the 100 MCVs (or whatever the stats target is set to for the system or column) then the issue may be mitigated without increasing planning time because of stats that are larger than prudent, and the "only" cost should be longer processing time when (auto) analyzing... plus overhead for considering this potential new setting in all analyze cases I suppose.

В списке pgsql-general по дате отправления:

Предыдущее

От: "Peter J. Holzer"
Дата: 24 июня 2020 г., 23:35:19
Сообщение: Re: n_distinct off by a factor of 1000

Следующее

От: Tim Cross
Дата: 25 июня 2020 г., 02:27:49
Сообщение: Re: Persistent Connections

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: n_distinct off by a factor of 1000

Предыдущее

Следующее