Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

From: Yugo NAGATA <nagata(at)sraoss(dot)co(dot)jp>
To: Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>
Cc: coelho(at)cri(dot)ensmp(dot)fr, thomas(dot)munro(at)gmail(dot)com, m(dot)polyakova(at)postgrespro(dot)ru, alvherre(at)2ndquadrant(dot)com, pgsql-hackers(at)postgresql(dot)org, teodor(at)sigaev(dot)ru
Subject: Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
Date: 2021-07-06 07:46:24
Message-ID: 20210706164624.dd22be9e4f10ed2657d25552@sraoss.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello Ishii-san,

On Fri, 02 Jul 2021 09:25:03 +0900 (JST)
Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp> wrote:

> I have found an interesting result from patched pgbench (I have set
> the isolation level to REPEATABLE READ):
>
> $ pgbench -p 11000 -c 10 -T 30 --max-tries=0 test
> pgbench (15devel, server 13.3)
> starting vacuum...end.
> transaction type: <builtin: TPC-B (sort of)>
> scaling factor: 1
> query mode: simple
> number of clients: 10
> number of threads: 1
> duration: 30 s
> number of transactions actually processed: 2586
> number of failed transactions: 9 (0.347%)
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> number of transactions retried: 1892 (72.909%)
> total number of retries: 21819
> latency average = 115.551 ms (including failures)
> initial connection time = 35.268 ms
> tps = 86.241799 (without initial connection time)
>
> I ran pgbench with 10 concurrent sessions. In this case pgbench always
> reports 9 failed transactions regardless the setting of -T
> option. This is because at the end of a pgbench session, only 1 out of
> 10 transaction succeeded but 9 transactions failed due to
> serialization error without any chance to retry because -T expires.
>
> This is a little bit disappointed because I wanted to see a result of
> all transactions succeeded with retries. I tried -t instead of -T but
> -t cannot be used with --max-tries=0.
>
> Also I think this behavior is somewhat inconsistent with existing
> behavior of pgbench. When pgbench runs without --max-tries option,
> pgbench continues to run transactions even after -T expires:
>
> $ time pgbench -p 11000 -T 10 -f pgbench.sql test
> pgbench (15devel, server 13.3)
> starting vacuum...end.
> transaction type: pgbench.sql
> scaling factor: 1
> query mode: simple
> number of clients: 1
> number of threads: 1
> duration: 10 s
> number of transactions actually processed: 2
> maximum number of tries: 1
> latency average = 7009.006 ms
> initial connection time = 8.045 ms
> tps = 0.142674 (without initial connection time)
>
> real 0m14.067s
> user 0m0.010s
> sys 0m0.004s
>
> $ cat pgbench.sql
> SELECT pg_sleep(7);
>
> So pgbench does not stop transactions after 10 seconds passed but
> waits for the last transaction completes. If we consistent with
> behavior when --max-tries=0, shouldn't we retry until the last
> transaction finishes?

I changed the previous patch to enable that the -T option can terminate
a retrying transaction and that we can specify --max-tries=0 without
--latency-limit if we have -T , according with the following comment.

> Doc says "you cannot use an infinite number of retries without latency-limit..."
>
> Why should this be forbidden? At least if -T timeout takes precedent and
> shortens the execution, ISTM that there could be good reason to test that.
> Maybe it could be blocked only under -t if this would lead to an non-ending
> run.

Indeed, as Ishii-san pointed out, some users might not want to terminate
retrying transactions due to -T. However, the actual negative effect is only
printing the number of failed transactions. The other result that users want to
know, such as tps, are almost not affected because they are measured for
transactions processed successfully. Actually, the percentage of failed
transaction is very little, only 0.347%.

In the existing behaviour, running transactions are never terminated due to
the -T option. However, ISTM that this would be based on an assumption
that a latency of each transaction is small and that a timing when we can
finish the benchmark would come soon. On the other hand, when transactions can
be retried unlimitedly, it may take a long time more than expected, and we can
not guarantee that this would finish successfully in limited time. Therefore,
terminating the benchmark by giving up to retry the transaction after time
expiration seems reasonable under unlimited retries. In the sense that we don't
terminate running transactions forcibly, this don't change the existing behaviour.

If you don't want to print the number of transactions failed due to -T, we can
fix to forbid to use -T without latency-limit under max-tries=0 for avoiding
possible never-ending-benchmark. In this case, users have to limit the number of
transaction retry by specifying latency-limit or max-tries (>0). However, if some
users would like to benchmark simply allowing unlimited retries, using -T and
max-tries=0 seems the most straight way, so I think it is better that they can be
used together.

Regards,
Yugo Nagata

--
Yugo NAGATA <nagata(at)sraoss(dot)co(dot)jp>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2021-07-06 07:58:03 Re: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE
Previous Message David Rowley 2021-07-06 07:26:49 Re: Add proper planner support for ORDER BY / DISTINCT aggregates