Re: Add min and max execute statement time in pg_stat_statement

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Arne Scheffer <arne(dot)scheffer(at)uni-muenster(dot)de>, David G Johnston <david(dot)g(dot)johnston(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Add min and max execute statement time in pg_stat_statement
Date: 2015-01-21 14:46:53
Message-ID: 54BFBBDD.8080302@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 01/21/2015 09:27 AM, Arne Scheffer wrote:
> Sorry, corrected second try because of copy&paste mistakes:
> VlG-Arne
>
>> Comments appreciated.
>> Definition var_samp = Sum of squared differences /n-1
>> Definition stddev_samp = sqrt(var_samp)
>> Example N=4
>> 1.) Sum of squared differences
>> 1_4Sum(Xi-XM4)²
>> =
>> 2.) adding nothing
>> 1_4Sum(Xi-XM4)²
>> +0
>> +0
>> +0
>> =
>> 3.) nothing changed
>> 1_4Sum(Xi-XM4)²
>> +(-1_3Sum(Xi-XM3)²+1_3Sum(Xi-XM3)²)
>> +(-1_2Sum(Xi-XM2)²+1_2Sum(Xi-XM2)²)
>> +(-1_1Sum(Xi-XM1)²+1_1Sum(Xi-XM1)²)
>> =
>> 4.) parts reordered
>> (1_4Sum(Xi-XM4)²-1_3Sum(Xi-XM3)²)
>> +(1_3Sum(Xi-XM3)²-1_2Sum(Xi-XM2)²)
>> +(1_2Sum(Xi-XM2)²-1_1Sum(Xi-XM1)²)
>> +1_1Sum(X1-XM1)²
>> =
>> 5.)
>> (X4-XM4)(X4-XM3)
>> + (X3-XM3)(X3-XM2)
>> + (X2-XM2)(X2-XM1)
>> + (X1-XM1)²
>> =
>> 6.) XM1=X1 => There it is - The iteration part of Welfords Algorithm
>> (in
>> reverse order)
>> (X4-XM4)(X4-XM3)
>> + (X3-XM3)(X3-XM2)
>> + (X2-XM2)(X2-X1)
>> + 0
>> The missing piece is 4.) to 5.)
>> it's algebra, look at e.g.:
>> http://jonisalonen.com/2013/deriving-welfords-method-for-computing-variance/
>
>

I have no idea what you are saying here.

Here are comments in email to me from the author of
<http://www.johndcook.com/blog/standard_deviation> regarding the divisor
used:

My code is using the unbiased form of the sample variance, dividing
by n-1.

It's usually not worthwhile to make a distinction between a sample
and a population because the "population" is often itself a sample.
For example, if you could measure the height of everyone on earth at
one instance, that's the entire population, but it's still a sample
from all who have lived and who ever will live.

Also, for large n, there's hardly any difference between 1/n and
1/(n-1).

Maybe I should add that in the code comments. Otherwise, I don't think
we need a change.

cheers

andrew

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2015-01-21 15:21:27 Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
Previous Message Arne Scheffer 2015-01-21 14:27:03 Re: Add min and max execute statement time in pg_stat_statement