RE: [Proposal] Add accumulated statistics for wait event

From: "Yotsunaga, Naoki" <yotsunaga(dot)naoki(at)jp(dot)fujitsu(dot)com>
To: 'Phil Florent' <philflorent(at)hotmail(dot)com>, 'Michael Paquier' <michael(at)paquier(dot)xyz>
Cc: 'Tomas Vondra' <tomas(dot)vondra(at)2ndquadrant(dot)com>, "'pgsql-hackers(at)lists(dot)postgresql(dot)org'" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: [Proposal] Add accumulated statistics for wait event
Date: 2018-11-01 08:01:32
Message-ID: 8E9126CB6CE2CD42962059AB0FBF7B0DC5F6E8@g01jpexmbkw23
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Oct 29, 2018 at 1:52 AM, Phil Florent wrote:

Hi, thank you for comments.

>Yes you will be able to solve bottlenecks with sampling. In interactive mode, a 1s interval is probably too large. I use 0s1 - 0s01 with my tool and it is normally OK.

With the tool you are using, can you sample at intervals shorter than 1 second?
If you can, you can get enough sampling number and you can also acquire short events.

>Since grafana is now able to connect directly to a postgresql source, I use it to display the information collected from pg_stat_activity and psutil ( e.g https://pgphil.ovh/traqueur_dashboard_02.php page is written in french but panels are in english)

It is wonderful to visualize.
Especially for beginners like me.

>Other DB have accumulated statistics but you can notice that sampling is also their most modern method.
>E.g Oracle DB : 20 years ago you already had tools like "utlbstat/utlestat" . Then you had "statspack". Those tools were based on accumulated statistics and the reports were based on differences between 2 points. It was useful to solve major problems but it was limited and not precise enough in many cases.

>The preferred feature to identify bottlenecks in the Oracle world is now ASH (active session history). It can help with major problems, specific problems AND it can identify short blockages.
>Too bad it is licensed as an option of their Enterprise Edition but similar tools exist and they are also based on sampling of the activity.

>With the "official" ASH, sampling and archiving are done internally and you have a circular memory zone dedicated to the feature. Hence the overhead is lower but that's all.

>The most advanced interactive tool is called "snapper" and it is also based on sampling.

Thanks. I will check it.

The current bottleneck survey method, from sampling, I can know the number (ratio) of waiting events.
Then, investigate from those with a high number of times (ratio).
Do you agree with this recognition?

---------------------------------------

Naoki, Yotsunaga.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Gierth 2018-11-01 08:15:18 Re: PostgreSQL Limits and lack of documentation about them.
Previous Message Vincent Mirian 2018-11-01 07:58:44 Hooks to Modify Execution Flow and Query Planner