Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Marina Polyakova <m(dot)polyakova(at)postgrespro(dot)ru>
Cc: pgsql-hackers(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Ildus Kurbangaliev <i(dot)kurbangaliev(at)postgrespro(dot)ru>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
Date: 2018-09-08 13:03:51
Message-ID: alpine.DEB.2.21.1809081450100.10506@lancre
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Hello Marina,

> v11-0003-Pgbench-errors-and-serialization-deadlock-retrie.patch
> - the main patch for handling client errors and repetition of transactions
> with serialization/deadlock failures (see the detailed description in the
> file).

About patch v11-3.

Patch applies cleanly on top of the other two. Compiles, global and local
"make check" are ok.

* Features

As far as the actual retry feature is concerned, I'd say we are nearly
there. However I have an issue with changing the behavior on meta command
and other sql errors, which I find not desirable.

When a meta-command fails, before the patch the command is aborted and
there is a convenient error message:

sh> pgbench -T 10 -f bad-meta.sql
bad-meta.sql:1: unexpected function name (false) in command "set" [...]
\set i false + 1 [...]

After the patch it is simply counted, pgbench loops on the same error till
the time is completed, and there are no clue about the actual issue:

sh> pgbench -T 10 -f bad-meta.sql
starting vacuum...end.
transaction type: bad-meta.sql
duration: 10 s
number of transactions actually processed: 0
number of failures: 27993953 (100.000%)
...

Same thing about SQL errors, an immediate abort...

sh> pgbench -T 10 -f bad-sql.sql
starting vacuum...end.
client 0 aborted in command 0 of script 0; ERROR: syntax error at or near ";"
LINE 1: SELECT 1 + ;

... is turned into counting without aborting nor error messages, so that
there is no clue that the user was asking for something bad.

sh> pgbench -T 10 -f bad-sql.sql
starting vacuum...end.
transaction type: bad-sql.sql
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
duration: 10 s
number of transactions actually processed: 0
number of failures: 274617 (100.000%)
# no clue that there was a syntax error in the script

I do not think that these changes of behavior are desirable. Meta command and
miscellaneous SQL errors should result in immediatly aborting the whole run,
because the client test code itself could not run correctly or the SQL sent
was somehow wrong, which is also the client's fault, and the server
performance bench does not make much sense in such conditions.

ISTM that the focus of this patch should only be to handle some server
runtime errors that can be retryed, but not to change pgbench behavior on
other kind of errors. If these are to be changed, ISTM that it would be a
distinct patch and would require some discussion, and possibly an option
to enable it or not if some use case emerge. AFA this patch is concerned,
I'd suggest to let that out.

Doc says "you cannot use an infinite number of retries without latency-limit..."

Why should this be forbidden? At least if -T timeout takes precedent and
shortens the execution, ISTM that there could be good reason to test that.
Maybe it could be blocked only under -t if this would lead to an non-ending
run.

As "--print-errors" is really for debug, maybe it could be named
"--debug-errors". I'm not sure that having "--debug" implying this option
is useful: As there are two distinct options, the user may be allowed
to trigger one or the other as they wish?

* Code

The following remarks are linked to the change of behavior discussed above:
makeVariableValue error message is not for debug, but must be kept in all
cases, and the false returned must result in an immediate abort. Same thing about
lookupCreateVariable, an invalid name is a user error which warrants an immediate
abort. Same thing again about coerce* functions or evalStandardFunc...
Basically, most/all added "debug_level >= DEBUG_ERRORS" are not desirable.

sendRollback(): I'd suggest to simplify. The prepare/extended statement stuff is
really about the transaction script, not dealing with errors, esp as there is no
significant advantage in preparing a "ROLLBACK" statement which is short and has
no parameters. I'd suggest to remove this function and just issue
PQsendQuery("ROLLBACK;") in all cases.

In copyVariables, I'd simplify

+ if (source_var->svalue == NULL)
+ dest_var->svalue = NULL;
+ else
+ dest_var->svalue = pg_strdup(source_var->svalue);

as:

dest_var->value = (source_var->svalue == NULL) ? NULL : pg_strdup(source_var->svalue);

+ if (sqlState) -> if (sqlState != NULL) ?

Function getTransactionStatus name does not seem to correspond fully to what the
function does. There is a passthru case which should be either avoided or
clearly commented.

About:

- commandFailed(st, "SQL", "perhaps the backend died while processing");
+ clientAborted(st,
+ "perhaps the backend died while processing");

keep on one line?

About:

+ if (doRetry(st, &now))
+ st->state = CSTATE_RETRY;
+ else
+ st->state = CSTATE_FAILURE;

-> st->state = doRetry(st, &now) ? CSTATE_RETRY : CSTATE_FAILURE;

* Comments

"There're different types..." -> "There are different types..."

"after the errors and"... -> "after errors and"...

"the default value of max_tries is set to 1" -> "the default value
of max_tries is 1"

"We cannot retry the transaction" -> "We cannot retry a transaction"

"may ultimately succeed or get a failure," -> "may ultimately succeed or fail,"

Overall, the comment text in StatsData is very clear. However they are not
clearly linked to the struct fields. I'd suggest that earch field when used
should be quoted, so as to separate English from code, and the struct name
should always be used explicitely when possible.

I'd insist in a comment that "cnt" does not include "skipped" transactions
(anymore).

* Documentation:

Some suggestions which may be improvements, although I'm not a native English
speaker.

ISTM that there are too many "the":
- "turns on the option ..." -> "turns on option ..."
- "When the option ..." -> "When option ..."
- "By default the option ..." -> "By default option ..."
- "only if the option ..." -> "only if option ..."
- "combined with the option ..." -> "combined with option ..."
- "without the option ..." -> "without option ..."
- "is the sum of all the retries" -> "is the sum of all retries"

"infinite" -> "unlimited"

"not retried at all" -> "not retried" (maybe several times).

"messages of all errors" -> "messages about all errors".

"It is assumed that the scripts used do not contain" ->
"It is assumed that pgbench scripts do not contain"

About v11-4. I'm do not feel that these changes are very useful/important
for now. I'd propose that your prioritize on updating 11-3 so that we can
have another round about it as soon as possible, and keep that one later.

--
Fabien.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Szima Gábor 2018-09-08 15:20:15 RULE does not like the NOT EXISTS condition
Previous Message Simon Riggs 2018-09-08 12:52:40 Re: StandbyAcquireAccessExclusiveLock doesn't necessarily