RE: [Proposal] Add accumulated statistics for wait event

From: Phil Florent <philflorent(at)hotmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: [Proposal] Add accumulated statistics for wait event
Date: 2018-07-24 16:23:03
Message-ID: DB7PR03MB47302E540376BFE87A05CF6DBA550@DB7PR03MB4730.eurprd03.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Hi,

> Some case, sampling of events can not find the cause of issue. It lose detail data.
> For example, some throughput issue occur(ex : disk io), but each wait point
> occurs only a few milliseconds.

It loses non meaningful details and it's in fact a good point. In this example, sampling will definitely find the cause and won't cost resources.

Being as precise as possible to define a wait event is very useful but knowing precisely the duration of each event is less useful in terms of tuning.

Example of sampling + group by/order by percentage of activity :

./t -d 5 -o "application_name, wait_event_type" -o "application_name, wait_event, wait_event_type"
traqueur 2.05.00 - performance tool for PostgreSQL 9.3 => 11
INFORMATION, no connection parameters provided, connecting to dedicated database ...
INFORMATION, connected to dedicated database traqueur
INFORMATION, PostgreSQL version : 110000
INFORMATION, sql preparation ...
INFORMATION, sql execution ...
busy_pc | distinct_exe | application_name | wait_event_type
---------+--------------+------------------+-----------------
206 | 8 / 103 | mperf |
62 | 2 / 31 | mperf | LWLock
20 | 3 / 10 | mperf | IO
12 | 1 / 6 | mperf | Client
(4 rows)

busy_pc | distinct_exe | application_name | wait_event | wait_event_type
---------+--------------+------------------+-----------------------+-----------------
206 | 8 / 103 | mperf | |
62 | 2 / 31 | mperf | WALWriteLock | LWLock
14 | 1 / 7 | mperf | DataFileImmediateSync | IO
12 | 1 / 6 | mperf | ClientRead | Client
2 | 1 / 1 | mperf | DataFileWrite | IO
2 | 1 / 1 | mperf | DataFileRead | IO
2 | 1 / 1 | mperf | WALInitWrite | IO

No need to know the exact duration of each event to identify the bottleneck(s)...

Best regards

Phil

________________________________
De : Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Envoyé : mardi 24 juillet 2018 17:45
À : pgsql-hackers(at)lists(dot)postgresql(dot)org
Objet : Re: [Proposal] Add accumulated statistics for wait event

On 07/24/2018 12:06 PM, MyungKyu LIM wrote:
> 2018-07-23 16:53 (GMT+9), Michael Paquier wrote:
>> On Mon, Jul 23, 2018 at 04:04:42PM +0900, 임명규 wrote:
>>> This proposal is about recording additional statistics of wait events.
>
>> I have comments about your patch. First, I don't think that you need to
>> count precisely the number of wait events triggered as usually when it
>> comes to analyzing a workload's bottleneck what counts is a periodic
>> *sampling* of events, patterns which can be fetched already from
>> pg_stat_activity and stored say in a different place.
>
> Thanks for your feedback.
>
> This proposal is not about *sampling*.
> Accumulated statistics of wait events information is useful for solving
> issue. It can measure accurate data.
>
> Some case, sampling of events can not find the cause of issue. It lose detail data.
> For example, some throughput issue occur(ex : disk io), but each wait point
> occurs only a few milliseconds.
> In this case, it is highly likely that will not find the cause.
>

I think it's highly likely that it will find the cause. The idea of
sampling is that while you don't measure the timing directly, you can
infer it from the frequency of the wait events in the samples. So if you
see the backend reports a particular wait event in 75% of samples, it
probably spent 75% time waiting on it.

I'm not saying sampling is perfect and it certainly is less convenient
than what you propose.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David G. Johnston 2018-07-24 16:28:01 Re: Stored procedures and out parameters
Previous Message David G. Johnston 2018-07-24 16:18:44 Re: Stored procedures and out parameters