Re: Weighted Stats

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: David Fetter <david(at)fetter(dot)org>
Cc: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Weighted Stats
Date: 2016-03-19 01:12:12
Message-ID: CAMkU=1y45wFL72-HkFo9SDR=Gont9qxR3=H+JzuH7=ok=PQb7w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 15, 2016 at 8:36 AM, David Fetter <david(at)fetter(dot)org> wrote:
>
> Please find attached a patch that uses the float8 version to cover the
> numeric types.

Is there a well-defined meaning for having a negative weight? If no,
should it be disallowed?

I don't know what I was expecting, but not this:

select weighted_avg(x,10000000-2*x) from generate_series(1,10000000) f(x);
weighted_avg
------------------
16666671666717.1

Also, I think it might not give the correct answer even without
negative weights:

create table foo as select floor(random()*10000)::int val from
generate_series(1,10000000);

create table foo2 as select val, count(*) from foo group by val;

Shouldn't these then give the same result:

select stddev_samp(val) from foo;
stddev_samp
-------------------
2887.054977297105

select weighted_stddev_samp(val,count) from foo2;
weighted_stddev_samp
----------------------
2887.19919651336

The 5th digit seems too early to be seeing round-off error.

Cheers,

Jeff

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2016-03-19 01:42:42 incorrect docs for pgbench / skipped transactions
Previous Message David Rowley 2016-03-19 00:52:10 Re: Parallel Aggregate