Re: Add min and max execute statement time in pg_stat_statement

From: David Fetter <david(at)fetter(dot)org>
To: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Peter Eisentraut <peter_e(at)gmx(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Add min and max execute statement time in pg_stat_statement
Date: 2015-02-19 18:10:54
Message-ID: 20150219181054.GB8831@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Feb 18, 2015 at 08:31:09PM -0700, David G. Johnston wrote:
> On Wed, Feb 18, 2015 at 6:50 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
> > On 02/18/2015 08:34 PM, David Fetter wrote:
> >
> >> On Tue, Feb 17, 2015 at 08:21:32PM -0500, Peter Eisentraut wrote:
> >>
> >>> On 1/20/15 6:32 PM, David G Johnston wrote:
> >>>
> >>>> In fact, as far as the database knows, the values provided to this
> >>>> function do represent an entire population and such a correction
> >>>> would be unnecessary. I guess it boils down to whether "future"
> >>>> queries are considered part of the population or whether the
> >>>> population changes upon each query being run and thus we are
> >>>> calculating the ever-changing population variance.
>
> > I think we should be calculating the population variance.
>
> >> Why population variance and not sample variance? In distributions
> >> where the second moment about the mean exists, it's an unbiased
> >> estimator of the variance. In this, it's different from the
> >> population variance.
>
> > Because we're actually measuring the whole population, and not a sample?

We're not. We're taking a sample, which is to say past measurements,
and using it to make inferences about the population, which includes
all queries in the future.

> The key incorrect word in David Fetter's statement is "estimator". We are
> not estimating anything but rather providing descriptive statistics for a
> defined population.

See above.

> Users extrapolate that the next member added to the population would be
> expected to conform to this statistical description without bias (though
> see below). We can also then define the new population by including this
> new member and generating new descriptive statistics (which allows for
> evolution to be captured in the statistics).
>
> Currently (I think) we allow the end user to kill off the entire population
> and build up from scratch so that while, in the short term, the ability to
> predict the attributes of future members is limited once the population has
> reached a statistically significant level ​new predictions will no longer
> be skewed by population members who attributes were defined in a older and
> possibly significantly different environment. In theory it would be nice
> to be able to give the user the ability to specify - by time or percentage
> - a subset of the population to leave alive.

I agree that stale numbers can fuzz things in a way we don't like, but
let's not creep too much feature in here.

> Actual time-weighted sampling would be an alternative but likely one
> significantly more difficult to accomplish.

Right.

Cheers,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2015-02-19 18:16:56 Re: INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0
Previous Message Alvaro Herrera 2015-02-19 17:39:27 Re: deparsing utility commands