Re: Weighted Stats

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: Weighted Stats
Дата
Msg-id 0caf72a3-0d81-cdf8-f525-f58aa8df3449@2ndquadrant.com
обсуждение исходный текст
Ответ на Re: Weighted Stats  (David Fetter <david@fetter.org>)
Ответы Re: Weighted Stats  (David Fetter <david@fetter.org>)
Список pgsql-hackers
Hi,

On 03/19/2016 07:34 AM, David Fetter wrote:
> On Fri, Mar 18, 2016 at 06:12:12PM -0700, Jeff Janes wrote:
>> On Tue, Mar 15, 2016 at 8:36 AM, David Fetter <david@fetter.org> wrote:
>>>
>>> Please find attached a patch that uses the float8 version to cover the
>>> numeric types.
>>
>> Is there a well-defined meaning for having a negative weight?  If no,
>> should it be disallowed?
>
> Opinions on this appear to vary.  A Wikipedia article defines weights
> as non-negative, while a manual to which it refers only uses non-zero.
>
> https://en.wikipedia.org/wiki/Weighted_arithmetic_mean#Mathematical_definition
> https://www.gnu.org/software/gsl/manual/html_node/Weighted-Samples.html

I don't think that actually allows negative weights. It says that
   w_i = 1/\sigma_i^2

and variance is always > 0, thus w_i > 0. The zero is used as a special 
flag to remove the sample from the data set in a simple way.

> I'm not sure which if either would be authoritative, but I could
> certainly make up variants for each assumption.
>
> The assumption they have in common about weights is that a zero
> weight is not part of the calculation, which assumption is
> implemented in the previously submitted code.

I think that if we're not sure what should happen with negative weights, 
then we should disallow them. It's easy to allow them later once we have 
a reasonable definition, but if we allow them now and later realize it 
should behave differently, we'll be in trouble because of breaking 
existing uses.

I can't really come up with a reasonable example that would actually use 
negative weights. Can you? That would probably help with defining the 
behavior correctly.

Allowing negative weights has other consequences. For example, what if 
sum(W) ends up being 0? For example

CREATE TABLE t (a float, b float);
INSERT INTO  t SELECT i,  1 FROM generate_series(1,1000) s(i);
INSERT INTO  t SELECT i, -1 FROM generate_series(1,1000) s(i);

SELECT weighted_avg(a,b) FROM t; weighted_avg
--------------          NaN
(1 row)

Is that the correct behavior? Why?

So -1 to allowing negative weights, unless we can come up with proper 
definition or at least good examples demonstrating the usefulness.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alvaro Herrera
Дата:
Сообщение: Re: pgbench stats per script & other stuff
Следующее
От: Fabien COELHO
Дата:
Сообщение: Re: pgbench stats per script & other stuff