Re: [PERFORM] Bad n_distinct estimation; hacks suggested?

Поиск
Список
Период
Сортировка
От Josh Berkus
Тема Re: [PERFORM] Bad n_distinct estimation; hacks suggested?
Дата
Msg-id 200504251213.18565.josh@agliodbs.com
обсуждение исходный текст
Ответ на Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Simon Riggs <simon@2ndquadrant.com>)
Ответы Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Josh Berkus <josh@agliodbs.com>)
Re: [PERFORM] Bad n_distinct estimation; hacks suggested?  (Andrew Dunstan <andrew@dunslane.net>)
Список pgsql-hackers
Simon, Tom:

While it's not possible to get accurate estimates from a fixed size sample, I
think it would be possible from a small but scalable sample: say, 0.1% of all
data pages on large tables, up to the limit of maintenance_work_mem.

Setting up these samples as a % of data pages, rather than a pure random sort,
makes this more feasable; for example, a 70GB table would only need to sample
about 9000 data pages (or 70MB).  Of course, larger samples would lead to
better accuracy, and this could be set through a revised GUC (i.e.,
maximum_sample_size, minimum_sample_size).

I just need a little help doing the math ... please?

--
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Simon Riggs
Дата:
Сообщение: Re: [PERFORM] Bad n_distinct estimation; hacks suggested?
Следующее
От: Josh Berkus
Дата:
Сообщение: Re: [PERFORM] Bad n_distinct estimation; hacks suggested?