RE: [Proposal] Add accumulated statistics for wait event

From: Phil Florent <philflorent(at)hotmail(dot)com>
To: "Yotsunaga, Naoki" <yotsunaga(dot)naoki(at)jp(dot)fujitsu(dot)com>, 'Michael Paquier' <michael(at)paquier(dot)xyz>
Cc: 'Tomas Vondra' <tomas(dot)vondra(at)2ndquadrant(dot)com>, "'pgsql-hackers(at)lists(dot)postgresql(dot)org'" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: [Proposal] Add accumulated statistics for wait event
Date: 2018-10-29 13:52:05
Message-ID: DB6PR0301MB22786C585EA07FC09FEF7553BAF30@DB6PR0301MB2278.eurprd03.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Is DBA really able to solve bottlenecks with sampling?

What I would like to say is that if we have information on the number of wait events and the wait time(like other DB), we can investigate more easily.

Yes you will be able to solve bottlenecks with sampling. In interactive mode, a 1s interval is probably too large. I use 0s1 - 0s01 with my tool and it is normally OK. In batch mode I use 1s=>10s. If you want to visualize the results it's easy to use a dedicated tool and bottlenecks will clearly appear .
Since grafana is now able to connect directly to a postgresql source, I use it to display the information collected from pg_stat_activity and psutil ( e.g https://pgphil.ovh/traqueur_dashboard_02.php page is written in french but panels are in english)

Other DB have accumulated statistics but you can notice that sampling is also their most modern method.
E.g Oracle DB : 20 years ago you already had tools like "utlbstat/utlestat" . Then you had "statspack". Those tools were based on accumulated statistics and the reports were based on differences between 2 points. It was useful to solve major problems but it was limited and not precise enough in many cases.

The preferred feature to identify bottlenecks in the Oracle world is now ASH (active session history). It can help with major problems, specific problems AND it can identify short blockages.
Too bad it is licensed as an option of their Enterprise Edition but similar tools exist and they are also based on sampling of the activity.

With the "official" ASH, sampling and archiving are done internally and you have a circular memory zone dedicated to the feature. Hence the overhead is lower but that's all.

The most advanced interactive tool is called "snapper" and it is also based on sampling.

Best regards
Phil

________________________________
De : Yotsunaga, Naoki <yotsunaga(dot)naoki(at)jp(dot)fujitsu(dot)com>
Envoyé : lundi 29 octobre 2018 02:20
À : 'Phil Florent'; 'Michael Paquier'
Cc : 'Tomas Vondra'; 'pgsql-hackers(at)lists(dot)postgresql(dot)org'
Objet : RE: [Proposal] Add accumulated statistics for wait event

On Thu, Oct 4, 2018 at 8:22 PM, Yotsunaga Naoki wrote:

Hi, I understood and thought of your statistic comment once again. In the case of sampling, is there enough statistic to investigate?

In the case of a long SQL, I think that it is possible to obtain a sufficient sampling number.

However, in the case of about 1 minute of SQL, only 60 samples can be obtained at most.

#Because legard’s comment.

https://www.postgresql.org/message-id/1539158356795-0.post%40n3.nabble.com

Does this sampling number of 60 give information that I really want?

Perhaps it is not to miss the real problem part?

---------------------------------------

Naoki, Yotsunaga.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message MichaelDBA 2018-10-29 13:54:56 pgadmin4 and scram
Previous Message Andreas 'ads' Scherbaum 2018-10-29 12:01:47 Re: INSTALL file