RE: [Proposal] Add accumulated statistics for wait event

From: Phil Florent <philflorent(at)hotmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: "Yotsunaga, Naoki" <yotsunaga(dot)naoki(at)jp(dot)fujitsu(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: [Proposal] Add accumulated statistics for wait event
Date: 2018-10-04 12:54:28
Message-ID: DB6PR0301MB22789286025B8F222E3CE770BAEA0@DB6PR0301MB2278.eurprd03.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

It's the same logic with any polling system. An integration calculation using monte-carlo method with only a few points won't be accurate enough and can even be completely wrong etc.
Polling is OK to troubleshoot a problem on the fly but 2 points are not enough. A few seconds are needed to obtain good enough data, e.g 5-10 seconds of polling with a 0.1=>0.01s interval between 2 queries of the activity.
Polling a few seconds while the user is waiting is normally enough to say if a significant part of the waits are on the database. It's very important to know that. With 1 hour of accumulated statistics, a DBA will always see something to fix. But if the user waits 10 seconds on a particular screen and 1 second is spent on the database it often won't directly help.
Polling gives great information with postgreSQL 10 but it was already useful to catch top queries etc. in older versions.
I always check if activity is adequately reported by my tool using known cases. I want to be sure it will report adequately things in real-world troubleshooting sessions. Sometimes there are bugs in my tool, once there was an issue with postgres (pgstat_report_activty() was not called by workers in parallel index creation)

Best regards
Phil

De : Michael Paquier <michael(at)paquier(dot)xyz>
Envoyé : jeudi 4 octobre 2018 12:58
À : Phil Florent
Cc : Yotsunaga, Naoki; Tomas Vondra; pgsql-hackers(at)lists(dot)postgresql(dot)org
Objet : Re: [Proposal] Add accumulated statistics for wait event

On Thu, Oct 04, 2018 at 09:32:37AM +0000, Phil Florent wrote:
> I am a DB beginner, so please tell me. It says that you can find
> events that are bottlenecks in sampling, but as you saw above, you can
> not find events shorter than the sampling interval, right?

Yes, which is why it would be as simple as making the interval shorter,
still not too short so as it bloats the amount of information fetched
which needs to be stored and afterwards (perhaps) treated for analysis.
This gets rather close to signal processing. A simple image is for
example, assuming that event A happens 100 times in an interval of 1s,
and event B only once in the same interval of 1s, then if the snapshot
interval is only 1s, then in the worst case A would be treated an equal
of B, which would be wrong.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-10-04 13:59:51 Re: SerializeParamList vs machines with strict alignment
Previous Message Laurenz Albe 2018-10-04 12:23:34 Re: pg_ls_tmpdir()