Re: Weighted Stats

From: David Fetter <david(at)fetter(dot)org>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Weighted Stats
Date: 2016-04-08 20:47:56
Message-ID: 20160408204756.GA15407@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Mar 20, 2016 at 03:38:40PM -0700, David Fetter wrote:
> On Sat, Mar 19, 2016 at 05:04:08PM +0100, Tomas Vondra wrote:
> > Hi,
> >
> > On 03/19/2016 07:34 AM, David Fetter wrote:
> > >On Fri, Mar 18, 2016 at 06:12:12PM -0700, Jeff Janes wrote:
> > >>On Tue, Mar 15, 2016 at 8:36 AM, David Fetter <david(at)fetter(dot)org> wrote:
> > >>>
> > >>>Please find attached a patch that uses the float8 version to cover the
> > >>>numeric types.
> > >>
> > >>Is there a well-defined meaning for having a negative weight? If no,
> > >>should it be disallowed?
> > >
> > >Opinions on this appear to vary. A Wikipedia article defines weights
> > >as non-negative, while a manual to which it refers only uses non-zero.
> > >
> > >https://en.wikipedia.org/wiki/Weighted_arithmetic_mean#Mathematical_definition
> > >https://www.gnu.org/software/gsl/manual/html_node/Weighted-Samples.html
> >
> > I don't think that actually allows negative weights. It says that
> >
> > w_i = 1/\sigma_i^2
> >
> > and variance is always > 0, thus w_i > 0. The zero is used as a special flag
> > to remove the sample from the data set in a simple way.
> >
> > >I'm not sure which if either would be authoritative, but I could
> > >certainly make up variants for each assumption.
> > >
> > >The assumption they have in common about weights is that a zero
> > >weight is not part of the calculation, which assumption is
> > >implemented in the previously submitted code.
> >
> > I think that if we're not sure what should happen with negative weights,
> > then we should disallow them. It's easy to allow them later once we have a
> > reasonable definition, but if we allow them now and later realize it should
> > behave differently, we'll be in trouble because of breaking existing uses.
>
> OK
>
> > I can't really come up with a reasonable example that would actually use
> > negative weights. Can you? That would probably help with defining the
> > behavior correctly.
>
> No, but I'm not a statistician. I've seen them mentioned in contexts
> that appear to be discussions among same, and again opinions vary.
>
> > Allowing negative weights has other consequences. For example, what if
> > sum(W) ends up being 0? For example
> >
> > CREATE TABLE t (a float, b float);
> > INSERT INTO t SELECT i, 1 FROM generate_series(1,1000) s(i);
> > INSERT INTO t SELECT i, -1 FROM generate_series(1,1000) s(i);
> >
> > SELECT weighted_avg(a,b) FROM t;
> > weighted_avg
> > --------------
> > NaN
> > (1 row)
> >
> > Is that the correct behavior? Why?
>
> It's not, and you're right.
>
> I will send a patch that disallows negative weights this evening or
> tomorrow. It will be slightly more complicated as I believe I will
> need to create a new accumulator function for the weighted_avg() case
> where I had been using an extant one before.
>
> Cheers,
> David.

Sorry about the delay. This patch disallows negative weights,
although it still has that odd difference Jeff found.

Cheers,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Attachment Content-Type Size
weighted_stats_003.diff text/plain 19.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2016-04-08 21:02:12 Re: pg_hba_lookup function to get all matching pg_hba.conf entries
Previous Message Tom Lane 2016-04-08 20:36:55 Re: pg_hba_lookup function to get all matching pg_hba.conf entries