Re: Add min and max execute statement time in pg_stat_statement

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: David Fetter <david(at)fetter(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Add min and max execute statement time in pg_stat_statement
Date: 2015-02-19 03:31:09
Message-ID: CAKFQuwZL4DTEuimUeVYSE=f_itYoaVaPBMTUiHkj7EMTDZQr0g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Feb 18, 2015 at 6:50 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:

>
> On 02/18/2015 08:34 PM, David Fetter wrote:
>
>> On Tue, Feb 17, 2015 at 08:21:32PM -0500, Peter Eisentraut wrote:
>>
>>> On 1/20/15 6:32 PM, David G Johnston wrote:
>>>
>>>> In fact, as far as the database knows, the values provided to this
>>>> function do represent an entire population and such a correction
>>>> would be unnecessary. I guess it boils down to whether "future"
>>>> queries are considered part of the population or whether the
>>>> population changes upon each query being run and thus we are
>>>> calculating the ever-changing population variance.
>>>>
>>>

> I think we should be calculating the population variance.
>>>
>>

> Why population variance and not sample variance? In distributions
>> where the second moment about the mean exists, it's an unbiased
>> estimator of the variance. In this, it's different from the
>> population variance.
>>
>

> Because we're actually measuring the whole population, and not a sample?
>

​This.

The key incorrect word in David Fetter's statement is "estimator". We are
not estimating anything but rather providing descriptive statistics for a
defined population.

Users extrapolate that the next member added to the population would be
expected to conform to this statistical description without bias (though
see below). We can also then define the new population by including this
new member and generating new descriptive statistics (which allows for
evolution to be captured in the statistics).

Currently (I think) we allow the end user to kill off the entire population
and build up from scratch so that while, in the short term, the ability to
predict the attributes of future members is limited once the population has
reached a statistically significant level ​new predictions will no longer
be skewed by population members who attributes were defined in a older and
possibly significantly different environment. In theory it would be nice
to be able to give the user the ability to specify - by time or percentage
- a subset of the population to leave alive.

Actual time-weighted sampling would be an alternative but likely one
significantly more difficult to accomplish. I really have dug too deep
into the mechanics of the current code but I don't see any harm in sharing
the thought.

David J.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2015-02-19 03:34:03 Dead code in gin_private.h related to page split in WAL
Previous Message Fujii Masao 2015-02-19 03:29:18 Re: pgaudit - an auditing extension for PostgreSQL