pgbench calculates summary numbers a wrong way.

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: pgbench calculates summary numbers a wrong way.
Date: 2020-09-17 08:41:54
Message-ID: 20200917.174154.1612932702426612205.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello.

While running pgbench, I found tps excluding connection somewhat
strange. To emphasize the behavior, I inserted some sleep at the end
of doConnect() and ran pgbench with several times.

$ pgbench -T 5 -j 1 -c 4 (with 1s sleep in doConnect())
> number of transactions actually processed: 409
! latency average = 48.980 ms
> tps = 81.665671 (including connections establishing)
> tps = 102.323285 (excluding connections establishing)

$ pgbench -T 5 -j 4 -c 4 (with 1s sleep in doConnect())
> number of transactions actually processed: 1628
! latency average = 12.303 ms
> tps = 325.123448 (including connections establishing)
> tps = 422.671015 (excluding connections establishing)

Why the latencies are so different? If I tell pgbench to measure the
latencies, it is about 8-9ms in both cases.

$ pgbench -T 5 -j 1 -c 4 -L100 (with 1s sleep in doConnect())
> number of transactions actually processed: 419
> number of transactions above the 100.0 ms latency limit: 0/419 (0.000 %)
! latency average = 8.984 ms
> latency stddev = 3.259 ms
> tps = 83.633694 (including connections establishing)
> tps = 104.909130 (excluding connections establishing)

$ pgbench -T 5 -j 4 -c 4 -L100 (with 1s sleep in doConnect())
> number of transactions actually processed: 2047
> number of transactions above the 100.0 ms latency limit: 0/2047 (0.000 %)
! latency average = 7.807 ms
> latency stddev = 2.916 ms
> tps = 408.677024 (including connections establishing)
> tps = 511.837509 (excluding connections establishing)

That's reasonable. So the latency estimation is apparently a kind of
bogus. It seems to me that the calculation is wrong to use
time_include. I think it should use time_exclude instead.

$ pgbench -T 5 -j 1 -c 4 (with 1s sleep in doConnect())
> number of transactions actually processed: 407
! latency average = 11.674 ms
> tps = 81.262992 (including connections establishing)
> tps = 101.724507 (excluding connections establishing)
>

$ pgbench -T 5 -j 4 -c 4 (with 1s sleep in doConnect())
> number of transactions actually processed: 1718
! latency average = 11.674 ms
> tps = 342.652333 (including connections establishing)
> tps = 428.850338 (excluding connections establishing)

Looks far better, but "tps excluding connection establishing" looks
strange. The reason seems to me threadRun() counts connection time a
wrong way. thread->conn_time is assumed to be the sum of the time took
by each client until it runs the first transaction. If each
connection took 1 second, conn_time of a thread that runs three
clients is calculated as 3 seconds. But in reality, if each connection
took 1 second, all clients on the thread waits for 3 seconds before
starting a transaction so it should be 3s * 3(clients/thread) = 9
seconds.

We could multiply conn_time by nstate on each thread, but the same
happens for -C (is_connect) mode. When a client took 1 second to
establish a connection, no other clients can do anything meanwhile.
Somewhat different from the !is_connect case precisely, but basically
the same at the point that making a connection of a client actually
blocks all other clients on the same thread.

Therefore I changed nclients in the calculation of time_exclude to
nthreads.

$ pgbench -T 5 -j 1 -c 4 (with 1s sleep in doConnect())
> number of transactions actually processed: 399
> latency average = 9.402 ms
> tps = 79.671581 (including connections establishing)
> tps = 425.460179 (excluding connections establishing)

$ pgbench -T 5 -j 4 -c 4 (with 1s sleep in doConnect())
> number of transactions actually processed: 1903
> latency average = 8.330 ms
> tps = 379.944770 (including connections establishing)
> tps = 480.211639 (excluding connections establishing)

Looks fine. Also the -C cases looks sainer.

(- is unpatched, + is patched)

> $ pgbench -T 5 -j 1 -c 4 -C (with 100ms sleep in doConnect())
### time_include=5.1, conn_total_time=4.9
- number of transactions actually processed: 43
- latency average = 469.850 ms
- tps = 8.513360 (including connections establishing)
- tps = 11.246110 (excluding connections establishing)
+ number of transactions actually processed: 38
+ latency average = 16.735 ms
+ tps = 7.493800 (including connections establishing)
+ tps = 239.022345 (excluding connections establishing)

> $ pgbench -T 5 -j 4 -c 4 -C (with 100ms sleep in doConnect())
### time_include=5.1, conn_total_time=18.3
-number of transactions actually processed: 172
-latency average = 117.571 ms
-tps = 34.021873 (including connections establishing)
-tps = 346.296591 (excluding connections establishing)
+ number of transactions actually processed: 172
+ latency average = 11.671 ms
+ tps = 33.851599 (including connections establishing)
+ tps = 342.716081 (excluding connections establishing)

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
0001-Fix-latency-and-tps-calculation-of-pgbench.patch text/x-patch 2.3 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrey M. Borodin 2020-09-17 08:43:52 Re: Yet another fast GiST build
Previous Message Konstantin Knizhnik 2020-09-17 08:40:59 Re: Built-in connection pooler