Обсуждение: Re: [HACKERS] Bad n_distinct estimation; hacks suggested?

Поиск

Список

Период

Сортировка

Re: [HACKERS] Bad n_distinct estimation; hacks suggested?

От

"Dave Held"

Дата:

26 апреля 2005 г., 23:08:13

> -----Original Message-----
> From: Gurmeet Manku [mailto:manku@CS.Stanford.EDU]
> Sent: Tuesday, April 26, 2005 5:01 PM
> To: Simon Riggs
> Cc: Tom Lane; josh@agliodbs.com; Greg Stark; Marko Ristola;
> pgsql-perform; pgsql-hackers@postgresql.org; Utkarsh Srivastava;
> snt@iastate.edu
> Subject: Re: [HACKERS] [PERFORM] Bad n_distinct estimation; hacks
> suggested?
>
> [...]
>  2. In a single scan, it is possible to estimate n_distinct by using
>     a very simple algorithm:
>
>  "Distinct sampling for highly-accurate answers to distinct value
>   queries and event reports" by Gibbons, VLDB 2001.
>
>  http://www.aladdin.cs.cmu.edu/papers/pdfs/y2001/dist_sampl.pdf
>
> [...]

This paper looks the most promising, and isn't too different
from what I suggested about collecting stats over the whole table
continuously.  What Gibbons does is give a hard upper bound on
the sample size by using a logarithmic technique for storing
sample information.  His technique appears to offer very good
error bounds and confidence intervals as shown by tests on
synthetic and real data.  I think it deserves a hard look from
people hacking the estimator.

__
David B. Held
Software Engineer/Array Services Group
200 14th Ave. East,  Sartell, MN 56377
320.534.3637 320.253.7800 800.752.8129

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Re: [HACKERS] Bad n_distinct estimation; hacks suggested?

Re: [HACKERS] Bad n_distinct estimation; hacks suggested?