Re: [Proposal] Add accumulated statistics for wait event

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Phil Florent <philflorent(at)hotmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [Proposal] Add accumulated statistics for wait event
Date: 2018-07-26 01:24:51
Message-ID: 20180726012451.GA3927@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jul 24, 2018 at 04:23:03PM +0000, Phil Florent wrote:
> It loses non meaningful details and it's in fact a good point. In this
> example, sampling will definitely find the cause and won't cost
> resources.

The higher the sampling frequency, the more details you get, with the
most load on the instance. So if you are able to take an infinity of
samples, where registering multiple times the same event for the same
backend also matters because its overall weight gets higher and it shows
up higher in profiles, then you would be able converge to the set of
results that this patch adds. Sampling method, especially its
frequency, is something controlled by the client and not the server.
Approaches like the one proposed here push the load on the server-side,
unconditionally, for *all* backends, and this has its cost.

Even if you have spiky workloads, sampling may miss those, but even with
adding counters for each event you would need to query the table holding
the counters at an insane frequency to be able to perhaps get something
out of it as you need to do sampling of the counters as well to extract
deltas.

As Tomas has mentioned up-thread, sampling is light-weight, as-is the
current design for wait events. Even if it is not perfect because it
cannot give exact numbers, it would find bottlenecks in really most
cases, and that's what matters. If not, increasing the sampling
frequency makes things easier to detect as well. What would be the
point of taking only one sample every checkpoint for example?

There may be a benefit in having counters, I don't know the answer to
that, though the point would be to make sure that there is a specific
set of workloads where it makes sense, still my gut feeling is that
sampling would be able to detect those anyway.

(I am not a computer scientist by default but a physicist, think fluid
dynamics and turbulence, and I had my load of random events and signal
analysis as well. All that is just statistics with attempts to approach
reality, where sampling is a life-saver over exactitude of
measurements.)

Adding hooks is not acceptable to me either, those have a cost, and it
is not clear what's the benefit we can get into putting hooks in such a
place for cases other than sampling and counters...
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2018-07-26 02:09:16 Re: no partition pruning when partitioning using array type
Previous Message Corey Huinker 2018-07-25 23:58:12 Re: Alter index rename concurrently to