RE: [Proposal] Add accumulated statistics for wait event

From: Phil Florent <philflorent(at)hotmail(dot)com>
To: "Yotsunaga, Naoki" <yotsunaga(dot)naoki(at)jp(dot)fujitsu(dot)com>, 'Michael Paquier' <michael(at)paquier(dot)xyz>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: [Proposal] Add accumulated statistics for wait event
Date: 2018-10-04 09:32:37
Message-ID: DB6PR0301MB2278A7E8D4503134FA9C7061BAEA0@DB6PR0301MB2278.eurprd03.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,
I am a DB beginner, so please tell me. It says that you can find events that are bottlenecks in sampling, but as you saw above, you can not find events shorter than the sampling interval, right?

If an event occurs frequently and if it is reported in pg_stat_activity, you will catch it again and again while sampling, no matter it duration.
Hence you just need to

* Sample the sessions and consider the active ones. You need to know if they are waiting (PostgreSQL now provides detailed wait events) or if they are on the CPU
* Optionally collect information on the system context at the time of sampling (CPU, memory...), it can be provided by many tools like psutil python library for example

If the client application itself provides information it's even more interesting. With something like program/module/action/client_info/sofar/totalwork in application_name you are able to focus directly on different kind of activity. It can give you information like "I/O waits are meaningful for my batch activity but not for my OLTP activity, if my goal is to improve response time for end users I have to consider that."

Best regards
Phil

________________________________
De : Yotsunaga, Naoki <yotsunaga(dot)naoki(at)jp(dot)fujitsu(dot)com>
Envoyé : jeudi 4 octobre 2018 10:31
À : 'Michael Paquier'; Phil Florent
Cc : Tomas Vondra; pgsql-hackers(at)lists(dot)postgresql(dot)org
Objet : RE: [Proposal] Add accumulated statistics for wait event

On Thu, July 26, 2018 at 1:25 AM, Michael Paquier wrote:
> Even if you have spiky workloads, sampling may miss those, but even with adding counters for each event
> you would need to query the table holding the counters at an insane frequency to be able to perhaps get
> something out of it as you need to do sampling of the counters as well to extract deltas.

Hi, I was wondering why PostgreSQL did not have the number of wait events and wait time that other databases such as Oracle had as a function, and when I was looking for related threads, I got to this thread.

I am a DB beginner, so please tell me. It says that you can find events that are bottlenecks in sampling, but as you saw above, you can not find events shorter than the sampling interval, right?
If this short event has occurred quite a number of times and it was a considerable amount of time in total, can you solve this problem with sampling?
# I have asked, but I could not understand much of the discussion above and I do not know if such a case can exist.
Also, I think that it can be solved by higher the sampling frequency, but the load will be high, right? I think this method is not very practical.

Moreover, I think that it is not implemented due to the reason that sampling is good as described above and because it affects performance.
How about switching the function on / off and implementing with default off?
Do you care about performance degradation during bottleneck investigation?
When investigating the bottleneck, I think that it is better to find the cause even at the expense of performance.
# If you can detect with high frequency sampling, I think that it depends on whether the sampling or the function(the number of wait events and wait time) is high load.

Since I am a DB beginner, I think that it is saying strange things.
I am glad if you tell me.

-----
Naoki Yotsunaga

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Konstantin Knizhnik 2018-10-04 09:53:45 Re: libpq compression
Previous Message Andreas Karlsson 2018-10-04 09:22:32 Re: Early WIP/PoC for inlining CTEs