Re: n_distinct off by a factor of 1000

Поиск

Список

Период

Сортировка

От	Klaudie Willis
Тема	Re: n_distinct off by a factor of 1000
Дата	25 июня 2020 г. 12:39:00
Msg-id	exYVuFsfuM3l2SdUeIoTOCojYDWE4uqL_jlZBZo0MB5ij7XZ8hlaOVzNKHJyRhb26mwAIaNxil96YgzzRxd1Tps2T4YrJUHMOtGraAi_hW8=@protonmail.com обсуждение исходный текст
Ответ на	Re: n_distinct off by a factor of 1000 (Michael Lewis <mlewis@entrata.com>)
Список	pgsql-general

Дерево обсуждения

If we could increase the sampling ratio beyond the hard coded 300x to get a more representative sample and use that to estimate ndistinct (and also the frequency of the most common values) but only actually stored the 100 MCVs (or whatever the stats target is set to for the system or column) then the issue may be mitigated without increasing planning time because of stats that are larger than prudent, and the "only" cost should be longer processing time when (auto) analyzing... plus overhead for considering this potential new setting in all analyze cases I suppose.

I found another large deviation in one of my bridge tables. It is an (int,int) table of 900M rows where the B column contains 2.7M distinct values, however the pg_stats table claims it to be only 10.400. These numbers are with a statistics target of 500. I'm not sure that really matters for the planner for the queries I run, but it makes me a little nervous :)

Also, is it just my data samples, or is the n_distinct way more often underestimated by a larger ratio, than overestimated?

В списке pgsql-general по дате отправления:

Предыдущее

От: Magnus Hagander
Дата: 25 июня 2020 г., 12:05:00
Сообщение: Re: Error in Table Creation

Следующее

От: Patrick FICHE
Дата: 25 июня 2020 г., 13:03:22
Сообщение: RE: PostGreSQL TDE encryption patch

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: n_distinct off by a factor of 1000

Предыдущее

Следующее