Re: Add min and max execute statement time in pg_stat_statement

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: David Fetter <david(at)fetter(dot)org>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Peter Eisentraut <peter_e(at)gmx(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Add min and max execute statement time in pg_stat_statement
Date: 2015-02-19 18:19:58
Message-ID: CAKFQuwb2Up+rMMxa3Wkx4DjwdKrgmiOzGGTnsr6=Lb3uJKy1zw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Feb 19, 2015 at 11:10 AM, David Fetter <david(at)fetter(dot)org> wrote:

> On Wed, Feb 18, 2015 at 08:31:09PM -0700, David G. Johnston wrote:
> > On Wed, Feb 18, 2015 at 6:50 PM, Andrew Dunstan <andrew(at)dunslane(dot)net>
> wrote:
> > > On 02/18/2015 08:34 PM, David Fetter wrote:
> > >
> > >> On Tue, Feb 17, 2015 at 08:21:32PM -0500, Peter Eisentraut wrote:
> > >>
> > >>> On 1/20/15 6:32 PM, David G Johnston wrote:
> > >>>
> > >>>> In fact, as far as the database knows, the values provided to this
> > >>>> function do represent an entire population and such a correction
> > >>>> would be unnecessary. I guess it boils down to whether "future"
> > >>>> queries are considered part of the population or whether the
> > >>>> population changes upon each query being run and thus we are
> > >>>> calculating the ever-changing population variance.
> >
> > > I think we should be calculating the population variance.
> >
> > >> Why population variance and not sample variance? In distributions
> > >> where the second moment about the mean exists, it's an unbiased
> > >> estimator of the variance. In this, it's different from the
> > >> population variance.
> >
> > > Because we're actually measuring the whole population, and not a
> sample?
>
> We're not. We're taking a sample, which is to say past measurements,
> and using it to make inferences about the population, which includes
> all queries in the future.
>
>
​"All past measurements" does not qualify as a "random sample" of a
population made up of all past measurements and any potential members that
may be added in the future. Without the "random sample" aspect you throw
away all pretense of avoiding bias so you might as well just call the
totality of the past measurements the population, describe them using
population descriptive statistics, and call it a day.

For large populations it isn't going to matter anyway but for small
populations the difference of one in the divisor seems like it would make
the past performance look worse than it actually was.

David J.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Rod Taylor 2015-02-19 18:23:40 Re: Allow "snapshot too old" error, to prevent bloat
Previous Message Peter Geoghegan 2015-02-19 18:16:56 Re: INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0