Quick Links

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

From:	Marina Polyakova <m(dot)polyakova(at)postgrespro(dot)ru>
To:	Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc:	pgsql-hackers(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Ildus Kurbangaliev <i(dot)kurbangaliev(at)postgrespro(dot)ru>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject:	Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
Date:	2018-09-12 12:59:35
Message-ID:	c262e889315625e0fc0d77ca78fe2eac@postgrespro.ru
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 11-09-2018 18:29, Fabien COELHO wrote:
> Hello Marina,
>
>> Hmm, but we can say the same for serialization or deadlock errors that
>> were not retried (the client test code itself could not run correctly
>> or the SQL sent was somehow wrong, which is also the client's fault),
>> can't we?
>
> I think not.
>
> If a client asks for something "legal", but some other client in
> parallel happens to make an incompatible change which result in a
> serialization or deadlock error, the clients are not responsible for
> the raised errors, it is just that they happen to ask for something
> incompatible at the same time. So there is no user error per se, but
> the server is reporting its (temporary) inability to process what was
> asked for. For these errors, retrying is fine. If the client was
> alone, there would be no such errors, you cannot deadlock with
> yourself. This is really an isolation issue linked to parallel
> execution.

You can get other errors that cannot happen for only one client if you
use shell commands in meta commands:

starting vacuum...end.
transaction type: pgbench_meta_concurrent_error.sql
scaling factor: 1
query mode: simple
number of clients: 2
number of threads: 1
number of transactions per client: 10
number of transactions actually processed: 20/20
maximum number of tries: 1
latency average = 6.953 ms
tps = 287.630161 (including connections establishing)
tps = 303.232242 (excluding connections establishing)
statement latencies in milliseconds and failures:
1.636 0 BEGIN;
1.497 0 \setshell var mkdir my_directory && echo 1
0.007 0 \sleep 1 us
1.465 0 \setshell var rmdir my_directory && echo 1
1.622 0 END;

starting vacuum...end.
mkdir: cannot create directory ‘my_directory’: File exists
mkdir: could not read result of shell command
client 1 got an error in command 1 (setshell) of script 0; execution of
meta-command failed
transaction type: pgbench_meta_concurrent_error.sql
scaling factor: 1
query mode: simple
number of clients: 2
number of threads: 1
number of transactions per client: 10
number of transactions actually processed: 19/20
number of failures: 1 (5.000%)
number of meta-command failures: 1 (5.000%)
maximum number of tries: 1
latency average = 11.782 ms (including failures)
tps = 161.269033 (including connections establishing)
tps = 167.733278 (excluding connections establishing)
statement latencies in milliseconds and failures:
2.731 0 BEGIN;
2.909 1 \setshell var mkdir my_directory && echo 1
0.231 0 \sleep 1 us
2.366 0 \setshell var rmdir my_directory && echo 1
2.664 0 END;

Or if you use untrusted procedural languages in SQL expressions (see the
used file in the attachments):

client 1 got an error in command 0 (SQL) of script 0; ERROR: could not
create the directory "my_directory": File exists at line 3.
CONTEXT: PL/Perl anonymous code block

transaction type: pgbench_concurrent_error.sql
scaling factor: 1
query mode: simple
number of clients: 2
number of threads: 1
number of transactions per client: 10
number of transactions actually processed: 18/20
number of failures: 2 (10.000%)
number of serialization failures: 0 (0.000%)
number of deadlock failures: 0 (0.000%)
number of other SQL failures: 2 (10.000%)
maximum number of tries: 1
latency average = 3.282 ms (including failures)
tps = 548.437196 (including connections establishing)
tps = 637.662753 (excluding connections establishing)
statement latencies in milliseconds and failures:
1.566 2 DO $$

starting vacuum...ERROR: relation "pgbench_branches" does not exist
(ignoring this error and continuing anyway)
ERROR: relation "pgbench_tellers" does not exist
(ignoring this error and continuing anyway)
ERROR: relation "pgbench_history" does not exist
(ignoring this error and continuing anyway)
end.
transaction type: pgbench_concurrent_error.sql
scaling factor: 1
query mode: simple
number of clients: 2
number of threads: 1
number of transactions per client: 10
number of transactions actually processed: 20/20
maximum number of tries: 1
latency average = 2.760 ms
tps = 724.746078 (including connections establishing)
tps = 853.131985 (excluding connections establishing)
statement latencies in milliseconds and failures:
1.893 0 DO $$

Or if you try to create a function and perhaps replace an existing one:

starting vacuum...end.
client 0 got an error in command 0 (SQL) of script 0; ERROR: duplicate
key value violates unique constraint "pg_proc_proname_args_nsp_index"
DETAIL: Key (proname, proargtypes, pronamespace)=(my_function, , 2200)
already exists.

client 0 got an error in command 0 (SQL) of script 0; ERROR: tuple
concurrently updated

client 1 got an error in command 0 (SQL) of script 0; ERROR: tuple
concurrently updated

client 0 got an error in command 0 (SQL) of script 0; ERROR: tuple
concurrently updated

client 1 got an error in command 0 (SQL) of script 0; ERROR: tuple
concurrently updated

client 0 got an error in command 0 (SQL) of script 0; ERROR: tuple
concurrently updated

transaction type: pgbench_create_function.sql
scaling factor: 1
query mode: simple
number of clients: 2
number of threads: 1
number of transactions per client: 10
number of transactions actually processed: 10/20
number of failures: 10 (50.000%)
number of serialization failures: 0 (0.000%)
number of deadlock failures: 0 (0.000%)
number of other SQL failures: 10 (50.000%)
maximum number of tries: 1
latency average = 82.881 ms (including failures)
tps = 12.065492 (including connections establishing)
tps = 12.092216 (excluding connections establishing)
statement latencies in milliseconds and failures:
82.549 10 CREATE OR REPLACE FUNCTION my_function()
RETURNS integer AS 'select 1;' LANGUAGE SQL;

>> Why not handle client errors that can occur (but they may also not
>> occur) the same way? (For example, always abort the client, or
>> conversely do not make aborts in these cases.) Here's an example of
>> such error:
>
>> client 5 got an error in command 1 (SQL) of script 0; ERROR: division
>> by zero
>
> This is an interesting case. For me we must stop the script because
> the client is asking for something "stupid", and retrying the same
> won't change the outcome, the division will still be by zero. It is
> the client responsability not to ask for something stupid, the bench
> script is buggy, it should not submit illegal SQL queries. This is
> quite different from submitting something legal which happens to fail.
> ...
>>> I'm not sure that having "--debug" implying this option
>>> is useful: As there are two distinct options, the user may be allowed
>>> to trigger one or the other as they wish?
>>
>> I'm not sure that the main debugging output will give a good clue of
>> what's happened without full messages about errors, retries and
>> failures...
>
> I'm more argumenting about letting the user decide what they want.
>
>> These lines are quite long - do you suggest to wrap them this way?
>
> Sure, if it is too long, then wrap.

Ok!

>>> Function getTransactionStatus name does not seem to correspond fully
>>> to what the function does. There is a passthru case which should be
>>> either avoided or clearly commented.
>>
>> I don't quite understand you - do you mean that in fact this function
>> finds out whether we are in a (failed) transaction block or not? Or do
>> you mean that the case of PQTRANS_INTRANS is also ok?...
>
> The former: although the function is named "getTransactionStatus", it
> does not really return the "status" of the transaction (aka
> PQstatus()?).

Thank you, I'll think how to improve it. Perhaps the name
checkTransactionStatus will be better...

>>> I'd insist in a comment that "cnt" does not include "skipped"
>>> transactions
>>> (anymore).
>>
>> If you mean CState.cnt I'm not sure if this is practically useful
>> because the code uses only the sum of all client transactions
>> including skipped and failed... Maybe we can rename this field to
>> nxacts or total_cnt?
>
> I'm fine with renaming the field if it makes thinks clearer. They are
> all counters, so naming them "cnt" or "total_cnt" does not help much.
> Maybe "succeeded" or "success" to show what is really counted?

Perhaps renaming of StatsData.cnt is better than just adding a comment
to this field. But IMO we have the same problem (They are all counters,
so naming them "cnt" or "total_cnt" does not help much.) for CState.cnt
which cannot be named in the same way because it also includes skipped
and failed transactions.

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment	Content-Type	Size
pgbench_concurrent_error.sql	text/plain	287 bytes

In response to

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors at 2018-09-11 15:29:03 from Fabien COELHO

Responses

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors at 2018-09-12 14:04:09 from Fabien COELHO

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Arthur Zakirov	2018-09-12 13:18:40	Re: simplify index tuple descriptor initialization
Previous Message	Christoph Berg	2018-09-12 12:45:17	[patch] Support LLVM 7