Re: Weighted Stats

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: David Fetter <david(at)fetter(dot)org>
Cc: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Weighted Stats
Date: 2016-03-19 08:45:23
Message-ID: CAMkU=1yH4KD=WaRUOwhtTvtMtwQE0wzRgJMCdeFPWeuJZkL-iQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 18, 2016 at 11:34 PM, David Fetter <david(at)fetter(dot)org> wrote:
> On Fri, Mar 18, 2016 at 06:12:12PM -0700, Jeff Janes wrote:

>> Also, I think it might not give the correct answer even without
>> negative weights:
>>
>> create table foo as select floor(random()*10000)::int val from
>> generate_series(1,10000000);
>>
>> create table foo2 as select val, count(*) from foo group by val;
>>
>> Shouldn't these then give the same result:
>>
>> select stddev_samp(val) from foo;
>> stddev_samp
>> -------------------
>> 2887.054977297105
>>
>> select weighted_stddev_samp(val,count) from foo2;
>> weighted_stddev_samp
>> ----------------------
>> 2887.19919651336
>>
>> The 5th digit seems too early to be seeing round-off error.
>
> Please pardon me if I've misunderstood, but you appear to be assuming
> that
>
> SELECT val, count(*) FROM foo GROUP BY val
>
> will produce precisely identical count(*)s at each row, which it
> overwhelmingly likely won't, producing the difference you see above.

I think the count for each val that gets put in foo2.count should be
the same as the weight of that val as it occurs in foo. Surely they
shouldn't all have the same weight in foo2, unless they all have the
same number of appearances in foo. Which, as you say, they are not
likely to. But still, the foo2.count that they do individually get
should be equal to their weight, shouldn't it?

The other two methods (*avg and *stddev_pop) do give the same answers
using the two different methods (unweighted against foo, weighted
against foo2)

Cheers,

Jeff

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2016-03-19 10:13:27 Re: Performance degradation in commit ac1d794
Previous Message Fabien COELHO 2016-03-19 07:29:15 Re: incorrect docs for pgbench / skipped transactions