Re: proposal : cross-column stats

Поиск

Список

Период

Сортировка

От	Tomas Vondra
Тема	Re: proposal : cross-column stats
Дата	27 декабря 2010 г. 14:12:29
Msg-id	4D18ACD4.4060502@fuzzy.cz обсуждение исходный текст
Ответ на	Re: proposal : cross-column stats (Florian Pflug <fgp@phlo.org>)
Список	pgsql-hackers

Дерево обсуждения

Dne 24.12.2010 13:37, Florian Pflug napsal(a):
> On Dec24, 2010, at 11:23 , Nicolas Barbier wrote:
> 
>> 2010/12/24 Florian Pflug <fgp@phlo.org>:
>>
>>> On Dec23, 2010, at 20:39 , Tomas Vondra wrote:
>>>
>>>>   I guess we could use the highest possible value (equal to the number
>>>>   of tuples) - according to wiki you need about 10 bits per element
>>>>   with 1% error, i.e. about 10MB of memory for each million of
>>>>   elements.
>>>
>>> Drat. I had expected these number to come out quite a bit lower than
>>> that, at least for a higher error target. But even with 10% false
>>> positive rate, it's still 4.5MB per 1e6 elements. Still too much to
>>> assume the filter will always fit into memory, I fear :-(
>>
>> I have the impression that both of you are forgetting that there are 8
>> bits in a byte. 10 bits per element = 1.25MB per milion elements.
> 
> Uh, of course. So in the real universe, the numbers are
> 
> ~1.2MB per 1e6 elements for a false positive rate of 1%
> ~0.5MB per 1e6 elements for a false positive rate of 10%
> 
> Hm. So for a table with a billion distinct elements, we'd need half
> a gigabyte per column for the filter. A tuple with two int columns
> takes at least 24+2*4 = 32bytes to store I think, making such a table
> at least 32GB in size. The filter size would thus be 1/64 of the table
> size in the worst case. 

I've read several papers about estimating the number of distinct values,
and I have some bad as well as good news. I don't want this thread to
grow infinitely, and it's a rather separate topic so I'll start a new
thread for ndistinct estimates.

regards
Tomas

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Magnus Hagander
Дата: 27 декабря 2010 г., 12:54:14
Сообщение: Re: Streaming replication as a separate permissions

Следующее

От: Tom Lane
Дата: 27 декабря 2010 г., 14:18:30
Сообщение: Re: and it's not a bunny rabbit, either

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: proposal : cross-column stats

Предыдущее

Следующее