Re: Add min and max execute statement time in pg_stat_statement

From: KONDO Mitsumasa <kondo(dot)mitsumasa(at)lab(dot)ntt(dot)co(dot)jp>
To: Stephen Frost <sfrost(at)snowman(dot)net>, Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Peter Geoghegan <pg(at)heroku(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Add min and max execute statement time in pg_stat_statement
Date: 2013-10-23 02:19:07
Message-ID: 5267321B.9030909@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi All,

(2013/10/22 22:26), Stephen Frost wrote:
> * Dimitri Fontaine (dimitri(at)2ndQuadrant(dot)fr) wrote:
>> In our case, what I keep experiencing with tuning queries is that we
>> have like 99% of them running under acceptable threshold and 1% of them
>> taking more and more time.
>
> This is usually described (at least where I come from) as 'rare events',
> which goes to Tom's point that averages, stddev, etc, are not ideal
> (though they are still better than nothing).
>
>>> good reason to suppose that query runtime is Gaussian? (I'd bet not;
>>> in particular, multimodal behavior seems very likely due to things like
>>> plan changes.) If not, how much does that affect the usefulness of
>>> a standard-deviation calculation?
>
> Oscillating plan changes may fit multimodal but I don't feel that's
> typical. My experience has been it's either an extremely rare plan
> difference or it's a shift from one plan to another over time.
After all, all of avg, min, max and stdev are only numerical value for predicting
model. There aren't the robustness and strictness such as Write Ahead Logging. It
resembles a weather forecast. They are still better than nothing.
It is needed a human judgment to finally suppose a cause from the numerical
values. By the way, we can guess probability of the value from stdev.
Therefore we can guess easily even if there is an extreme value in min/max
whether it is normal or not.

>> What I've been gathering from my quick chat this morning is that either
>> you know how to characterize the distribution and then the min max and
>> average are useful on their own, or you need to keep track of an
>> histogram where all the bins are of the same size to be able to learn
>> what the distribution actually is.
>
> A histogram would certainly be useful. We may also wish to look into
> outlier/rare event detection methods and increase the logging we do in
> those cases (if possible).
>
>> Of course keeping enough stats seems to always begin with keeping the
>> min, max and average, so we can just begin there. We would just be
>> unable to answer interesting questions with just that.
>
> It would probably be good to do some research into techniques for
> outlier detection which minimizes CPU and storage cost.
pg_stat_statement is often used in operating database system, so I don't
like high CPU usage implementation. The software which will be lessor
performance just to install it is too unpleasant to accept. And if we
need more detail information for SQL tuning, it would be better to
develop other useful performance tuning and monitoring contrib not to
use in operating database system.

Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2013-10-23 04:14:09 Re: Compression of full-page-writes
Previous Message Tom Lane 2013-10-23 02:07:27 Sigh, my old HPUX box is totally broken by DSM patch