RE: [Proposal] Add accumulated statistics

From: "imai(dot)yoshikazu(at)fujitsu(dot)com" <imai(dot)yoshikazu(at)fujitsu(dot)com>
To: "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>, 'Pavel Stehule' <pavel(dot)stehule(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "yotsunaga(dot)naoki(at)fujitsu(dot)com" <yotsunaga(dot)naoki(at)fujitsu(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Phil Florent <philflorent(at)hotmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: [Proposal] Add accumulated statistics
Date: 2019-10-30 05:51:11
Message-ID: OSAPR01MB46097457DD12DF85A3C42F4794600@OSAPR01MB4609.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Tue, Jan 15, 2019 at 1:14 AM, Tsunakawa, Takayuki wrote:

[ ... absent for a long time ]

I read the discussions of this thread.

If we want wait event info, we can currently do sampling from pg_stat_activity and get pseudo wait event total duration.
(I understand wait event sampling does *not* represent wait event count but wait event duration.)

It is said that, in most cases, sampling is enough to find the cause of issue. On the other hand, it is said that sampling missed wait events that don't spent much time in the process. Can we say we don't need such wait events that seems apparently not important?

Also we can know the pseudo total duration from the sampling though, we can't know the how many times wait events occurred and how long wait events take time in the average each time. I think it would be better to also know wait event average time to investigate the cause of the issue.

For example, counts, total duration and average times of each wait events after executing pgbench, which is gotten from the server applied my patch, is like below.

transaction type: <builtin: TPC-B (sort of)>
scaling factor: 100
query mode: simple
number of clients: 8
number of threads: 8
duration: 60 s
number of transactions actually processed: 132259
latency average = 3.630 ms
tps = 2203.909727 (including connections establishing)
tps = 2204.248691 (excluding connections establishing)

wait_event_type | wait_event | calls | times | times/calls
-----------------+----------------------+--------+-----------+------------
Client | ClientRead | 925732 | 115270950 | 124.52
IO | DataFileRead | 220357 | 3219893 | 14.61
LWLock | WALWriteLock | 195560 | 192684987 | 985.30
IO | DataFileWrite | 156208 | 4440091 | 28.42
IO | WALWrite | 51035 | 2925808 | 57.33
IO | WALSync | 50812 | 49626240 | 976.66
Lock | transactionid | 6179 | 9905727 | 1603.13
LWLock | buffer_content | 5453 | 295123 | 54.12
IO | DataFileExtend | 2367 | 90430 | 38.20
LWLock | wal_insert | 2053 | 33620 | 16.38
LWLock | WALBufMappingLock | 1301 | 23761 | 18.26
LWLock | CLogControlLock | 514 | 3959 | 7.70
LWLock | buffer_mapping | 162 | 1313 | 8.10
LWLock | XidGenLock | 153 | 2399 | 15.68
LWLock | lock_manager | 122 | 5051 | 41.40
Lock | tuple | 105 | 114632 | 1091.73
LWLock | ProcArrayLock | 99 | 1828 | 18.46
LWLock | CheckpointerCommLock | 33 | 854 | 25.88
Lock | extend | 24 | 20493 | 853.88
IO | RelationMapRead | 22 | 71 | 3.23
LWLock | buffer_io | 11 | 7398 | 672.55
IO | SLRUWrite | 4 | 249 | 62.25
IO | SLRUSync | 4 | 0 | 0.00
IPC | ClogGroupUpdate | 3 | 223 | 74.33
LWLock | SyncRepLock | 1 | 1 | 1.00

We can see that WALWriteLock has the longest total duration(times), and ClientRead is the next one. We might get this information from the sampling. We can also see that WALWriteLock longest total duration is because of its high average time, while ClientRead total duration is because of its many counts. We can also know that transactionid, tuple, WALWriteLock, WALSync, extend have the longer average time than the others. If we want to improve the performance even for just a bit, such info is useful I think.

The patch which is proposed in this thread was not accepted because it loads overhead on the server/backend.
The function that measures the wait event time, gettimeofday and clock_gettime, takes time to execute and it is on the frequently called path(pgstat_report_wait_start/end), it is supposed a lot of overhead will be introduced.

Even if we only count the wait event, there would be overhead, which is pointed out in the another thread.[1]

The overhead which is induced by getting wait event info was discussed from old times, but I couldn't find the actual measuring results, so I want to measure its overhead.

[1] https://www.postgresql.org/message-id/CA%2BTgmobf1NJD%2B_DfQG5qccG5YFSnxk3CgC2mh0-UHabznCQtYA%40mail.gmail.com

--
Yoshikazu Imai

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message vignesh C 2019-10-30 05:51:59 Re: Remove unused function argument
Previous Message Tom Lane 2019-10-30 05:44:05 Re: Rearranging ALTER TABLE to avoid multi-operations bugs