Re: n_distinct off by a factor of 1000

Поиск
Список
Период
Сортировка
От Klaudie Willis
Тема Re: n_distinct off by a factor of 1000
Дата
Msg-id exYVuFsfuM3l2SdUeIoTOCojYDWE4uqL_jlZBZo0MB5ij7XZ8hlaOVzNKHJyRhb26mwAIaNxil96YgzzRxd1Tps2T4YrJUHMOtGraAi_hW8=@protonmail.com
обсуждение исходный текст
Ответ на Re: n_distinct off by a factor of 1000  (Michael Lewis <mlewis@entrata.com>)
Список pgsql-general

If we could increase the sampling ratio beyond the hard coded 300x to get a more representative sample and use that to estimate ndistinct (and also the frequency of the most common values) but only actually stored the 100 MCVs (or whatever the stats target is set to for the system or column) then the issue may be mitigated without increasing planning time because of stats that are larger than prudent, and the "only" cost should be longer processing time when (auto) analyzing... plus overhead for considering this potential new setting in all analyze cases I suppose.

I found another large deviation in one of my bridge tables. It is an (int,int) table of 900M rows where the B column contains 2.7M distinct values, however the pg_stats table claims it to be only 10.400. These numbers are with a statistics target of 500.  I'm not sure that really matters for the planner for the queries I run, but it makes me a little nervous :)

Also, is it just my data samples, or is the n_distinct way more often underestimated by a larger ratio, than overestimated?

K

В списке pgsql-general по дате отправления:

Предыдущее
От: Magnus Hagander
Дата:
Сообщение: Re: Error in Table Creation
Следующее
От: Patrick FICHE
Дата:
Сообщение: RE: PostGreSQL TDE encryption patch