Re: Weighted Stats

Поиск
Список
Период
Сортировка
От Jeff Janes
Тема Re: Weighted Stats
Дата
Msg-id CAMkU=1yH4KD=WaRUOwhtTvtMtwQE0wzRgJMCdeFPWeuJZkL-iQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Weighted Stats  (David Fetter <david@fetter.org>)
Список pgsql-hackers
On Fri, Mar 18, 2016 at 11:34 PM, David Fetter <david@fetter.org> wrote:
> On Fri, Mar 18, 2016 at 06:12:12PM -0700, Jeff Janes wrote:

>> Also, I think it might not give the correct answer even without
>> negative weights:
>>
>> create table foo as select floor(random()*10000)::int val from
>> generate_series(1,10000000);
>>
>> create table foo2 as select val, count(*) from foo group by val;
>>
>> Shouldn't these then give the same result:
>>
>> select stddev_samp(val) from foo;
>>     stddev_samp
>> -------------------
>>  2887.054977297105
>>
>> select weighted_stddev_samp(val,count) from foo2;
>>  weighted_stddev_samp
>> ----------------------
>>      2887.19919651336
>>
>> The 5th digit seems too early to be seeing round-off error.
>
> Please pardon me if I've misunderstood, but you appear to be assuming
> that
>
>     SELECT val, count(*) FROM foo GROUP BY val
>
> will produce precisely identical count(*)s at each row, which it
> overwhelmingly likely won't, producing the difference you see above.

I think the count for each val that gets put in foo2.count should be
the same as the weight of that val as it occurs in foo.  Surely they
shouldn't all have the same weight in foo2, unless they all have the
same number of appearances in foo.  Which, as you say, they are not
likely to.  But still, the foo2.count that they do individually get
should be equal to their weight, shouldn't it?

The other two methods (*avg and *stddev_pop) do give the same answers
using the two different methods (unweighted against foo, weighted
against foo2)

Cheers,

Jeff



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Fabien COELHO
Дата:
Сообщение: Re: incorrect docs for pgbench / skipped transactions
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: Performance degradation in commit ac1d794