Re: Weighted Stats

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Weighted Stats
Date: 2016-03-19 16:04:08
Message-ID: 0caf72a3-0d81-cdf8-f525-f58aa8df3449@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 03/19/2016 07:34 AM, David Fetter wrote:
> On Fri, Mar 18, 2016 at 06:12:12PM -0700, Jeff Janes wrote:
>> On Tue, Mar 15, 2016 at 8:36 AM, David Fetter <david(at)fetter(dot)org> wrote:
>>>
>>> Please find attached a patch that uses the float8 version to cover the
>>> numeric types.
>>
>> Is there a well-defined meaning for having a negative weight? If no,
>> should it be disallowed?
>
> Opinions on this appear to vary. A Wikipedia article defines weights
> as non-negative, while a manual to which it refers only uses non-zero.
>
> https://en.wikipedia.org/wiki/Weighted_arithmetic_mean#Mathematical_definition
> https://www.gnu.org/software/gsl/manual/html_node/Weighted-Samples.html

I don't think that actually allows negative weights. It says that

w_i = 1/\sigma_i^2

and variance is always > 0, thus w_i > 0. The zero is used as a special
flag to remove the sample from the data set in a simple way.

> I'm not sure which if either would be authoritative, but I could
> certainly make up variants for each assumption.
>
> The assumption they have in common about weights is that a zero
> weight is not part of the calculation, which assumption is
> implemented in the previously submitted code.

I think that if we're not sure what should happen with negative weights,
then we should disallow them. It's easy to allow them later once we have
a reasonable definition, but if we allow them now and later realize it
should behave differently, we'll be in trouble because of breaking
existing uses.

I can't really come up with a reasonable example that would actually use
negative weights. Can you? That would probably help with defining the
behavior correctly.

Allowing negative weights has other consequences. For example, what if
sum(W) ends up being 0? For example

CREATE TABLE t (a float, b float);
INSERT INTO t SELECT i, 1 FROM generate_series(1,1000) s(i);
INSERT INTO t SELECT i, -1 FROM generate_series(1,1000) s(i);

SELECT weighted_avg(a,b) FROM t;
weighted_avg
--------------
NaN
(1 row)

Is that the correct behavior? Why?

So -1 to allowing negative weights, unless we can come up with proper
definition or at least good examples demonstrating the usefulness.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2016-03-19 16:08:20 Re: pgbench stats per script & other stuff
Previous Message Alvaro Herrera 2016-03-19 15:41:19 Re: pgbench stats per script & other stuff