RE: [Proposal] Add accumulated statistics

From: "imai(dot)yoshikazu(at)fujitsu(dot)com" <imai(dot)yoshikazu(at)fujitsu(dot)com>
To: "imai(dot)yoshikazu(at)fujitsu(dot)com" <imai(dot)yoshikazu(at)fujitsu(dot)com>, "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>, 'Pavel Stehule' <pavel(dot)stehule(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "yotsunaga(dot)naoki(at)fujitsu(dot)com" <yotsunaga(dot)naoki(at)fujitsu(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Phil Florent <philflorent(at)hotmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: [Proposal] Add accumulated statistics
Date: 2019-10-30 05:55:28
Message-ID: OSAPR01MB4609E9796B0B15DBFD6049DD94600@OSAPR01MB4609.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Oct 30, 2019 at 5:51 AM, imai(dot)yoshikazu(at)fujitsu(dot)com wrote:
> The overhead which is induced by getting wait event info was discussed from old times, but I couldn't find the actual
> measuring results, so I want to measure its overhead.

And here is the patch which counts the wait event and measuring the wait event time. It is currently like POC and has several things to be improved.

You can get wait event info by executing "select * from pg_stat_waitaccum;" and reset its counters by "select pg_stat_reset_shared('waitaccum');".

I tried to reduce the overhead of counting the wait event. It is difficult to reduce the overhead of measuring wait events times, I made measuring time is configurable like track_io_timing.

In the other DB, they use more light function than gettimeofday or clock_gettime and it is the CPU cycle counter, rdtsc.
So I also created the patch which uses rdtsc for measuring wait events times.
There are some investigations for rdtsc before[1]. If we want to use rdtsc, we need more investigation for which platform can use it or how to prevent time go backwards in the rdtsc and so on.
Here, I wanted to see its overhead, so I didn't care such things in this patch.

I measured its performance with pgbench on a VM machine which has 4 core CPU and 8 GB Mem.
I tested against below four cases.

master: master (2fc2a88)

only counts: 0001 patched, only counts wait events
(with track_wait_timing is off)
counts/time: 0001 patched, counts wait event and measure its time by gettimeofday or clock_gettime
(with track_wait_timing is on)
counts/time(rdtsc): 0001 + 0002 patched, counts wait event and measure its time by rdtsc
(with track_wait_timing is on)

I executed the below pgbench scripts, initializing database with scale 1 or 100 and executing pgbench with standard mode or select-only mode.

[standard mode]
for i in `seq 1 15`
do
pgbench -i -s (1 or 100) -q
pgbench -c 8 -j 8 -n -T 60
done

[select only mode]
pgbench -i -s (1 or 100) -q
for i in `seq 1 10`
do
pgbench -c 8 -j 8 -n -S -T 60
done

The result was below.

[standard, scale 1]
version | TPS | diff(%)
master | 813.82 | 0
only counts | 797.48 | 2.01
counts/time | 833.16 | -2.38
counts/time(rdtsc) | 876.29 | -7.68

[standard, scale 100]
version | TPS | diff(%)
master | 2170.34 | 0
only counts | 2125.37 | 2.07
counts/time | 2147.8 | 1.04
counts/time(rdtsc) | 2187.37 | -0.785

[select-only, scale 1]
version | TPS | diff(%)
master | 28487.6 | 0
only counts | 28481.1 | 0.023
counts/time | 28364.7 | 0.431
counts/time(rdtsc) | 28462.6 | 0.088

[select-only, scale 100]
version | TPS | diff(%)
master | 25767.89 | 0
only counts | 26068.65 | -1.167
counts/time | 25567.69 | 0.777
counts/time(rdtsc) | 25525.26 | 0.942

An example of wait event info after executing pgbench was below.

[standard, scale 100]
number of transactions actually processed: 129844
latency average = 3.697 ms
tps = 2163.667427 (including connections establishing)
tps = 2163.918398 (excluding connections establishing)

wait_event_type | wait_event | calls | times
-----------------+----------------------+--------+-----------
Client | ClientRead | 908807 | 114473878
IO | DataFileRead | 216025 | 2867211
LWLock | WALWriteLock | 191977 | 195192237
IO | DataFileWrite | 154540 | 3406232
IO | WALWrite | 49932 | 2728543
IO | WALSync | 49737 | 49649308
Lock | transactionid | 6209 | 8999545
LWLock | buffer_content | 5337 | 288951
IO | DataFileExtend | 2346 | 90375
LWLock | wal_insert | 2013 | 25141
LWLock | WALBufMappingLock | 630 | 14680
LWLock | CLogControlLock | 454 | 2414
LWLock | buffer_mapping | 170 | 852
LWLock | XidGenLock | 146 | 3268
LWLock | lock_manager | 141 | 5209
Lock | tuple | 112 | 120163
LWLock | ProcArrayLock | 97 | 495
Lock | extend | 42 | 26875
IO | RelationMapRead | 22 | 57
LWLock | CheckpointerCommLock | 18 | 1217
IO | DataFilePrefetch | 18 | 24
IPC | ClogGroupUpdate | 9 | 1867
LWLock | SyncRepLock | 3 | 116
IO | DataFileTruncate | 1 | 116

[select-only, scale 1]
number of transactions actually processed: 1682642
latency average = 0.285 ms
tps = 28043.407989 (including connections establishing)
tps = 28048.158085 (excluding connections establishing)
wait_event_type | wait_event | calls | times
-----------------+-----------------+---------+-----------
Client | ClientRead | 1682661 | 287999638
IO | RelationMapRead | 22 | 54
LWLock | lock_manager | 2 | 2087

[select-only, scale 100]
number of transactions actually processed: 1513536
latency average = 0.317 ms
tps = 25223.558569 (including connections establishing)
tps = 25228.820644 (excluding connections establishing)
wait_event_type | wait_event | calls | times
-----------------+-----------------+---------+-----------
IO | DataFileRead | 2524682 | 14579531
Client | ClientRead | 1513558 | 283968554
LWLock | buffer_mapping | 1225 | 6392
IO | RelationMapRead | 22 | 46
LWLock | buffer_io | 11 | 876
LWLock | lock_manager | 6 | 507

I wanted to measure and compare the performance where wait_event occurs many times, but unfortunately, TPS was not stable and differed in each standard test that wait_event occurs many times that I couldn't get consistent performance tendency between each version.

I need to find more suitable test for clarifying its performance or if there are any good tests, please let me know. Also, any tests are very welcome.

--
Yoshikazu Imai

Attachment Content-Type Size
0001-Adding-the-pg_stat_waitaccum-view-which-shows-counts-v1.patch application/octet-stream 30.6 KB
0002-POC-Changed-measuring-method-of-wait-event-timed-v2.patch application/octet-stream 4.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2019-10-30 05:59:42 Re: pg_waldump erroneously outputs newline for FPWs, and another minor bug
Previous Message vignesh C 2019-10-30 05:51:59 Re: Remove unused function argument