Re: Estimating number of distinct values.

Поиск

Список

Период

Сортировка

От	Jeff Janes
Тема	Re: Estimating number of distinct values.
Дата	24 октября 2018 г. 17:47:37
Msg-id	CAMkU=1wzo9B+6BXYMH0btTYudv9-cigPdd-64Exr852oYL_2mw@mail.gmail.com обсуждение исходный текст
Ответ на	Estimating number of distinct values. (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>)
Список	pgsql-hackers

Дерево обсуждения

On Wed, Oct 24, 2018 at 10:07 AM Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:

Real number of distinct value for this dataset is about 10 millions. For
some reasons, sampling using random blocks and Vitter algorithm produces
worser results than just examining first 30000 rows of the table:

It is a known problem with our sampling method that once a block is chosen, it then oversamples rows from that block[1]. So you get too many blocks with 0 rows sampled or 2 or more rows samples, and too few with exactly one row sampled. If rows with the same value are clustered together into same blocks, this will find too many duplicates and really skew the Duj1 estimate, because we feeding it a biased sample.

Tomas was working on a patch to make the sampling truly random[2], but I think he abandoned it to work on the multivariate statistics instead. It is hard to tell if the IO implications of no longer reading sampled blocks in physical order would be acceptable, because everyone has different hardware, data, and ideas of what is acceptable.

[1] https://www.postgresql.org/message-id/CAMkU=1wRH_jopyCAyUKbdQY4DWhsx1-1e2s0VVgfrryfXDe2SQ@mail.gmail.com

[2] https://www.postgresql.org/message-id/5588D644.1000909%402ndquadrant.com

Cheers,

Jeff

В списке pgsql-hackers по дате отправления:

Предыдущее

От: "David G. Johnston"
Дата: 24 октября 2018 г., 17:40:00
Сообщение: Re: JSON validation behavior

Следующее

От: Hironobu SUZUKI
Дата: 24 октября 2018 г., 17:55:57
Сообщение: Re: pgbench - add pseudo-random permutation function

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Estimating number of distinct values.

Предыдущее

Следующее