Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Marina Polyakova <m(dot)polyakova(at)postgrespro(dot)ru>
Cc: pgsql-hackers(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Ildus Kurbangaliev <i(dot)kurbangaliev(at)postgrespro(dot)ru>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
Date: 2018-08-17 07:49:08
Message-ID: alpine.DEB.2.21.1808170917510.20841@lancre
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Hello Marina,

>> Detailed -r report. I understand from the doc that the retry number on
>> the detailed per-statement report is to identify at what point errors
>> occur? Probably this is more or less always at the same point on a
>> given script, so that the most interesting feature is to report the
>> number of retries at the script level.
>
> This may depend on various factors.. for example:
> [...]
> 21.239 5 36 UPDATE xy3 SET y = y + :delta WHERE x
> = :x3;
> 21.360 5 39 UPDATE xy2 SET y = y + :delta WHERE x
> = :x2;

Ok, not always the same point, and you confirm that it identifies where
the error is raised which leads to a retry.

> And you can always get the number of retries at the script level from the
> main report (if only one script is used) or from the report for each script
> (if multiple scripts are used).

Ok.

>> ISTM that "skipped" transactions are NOT "successful" so there are a
>> problem with comments. I believe that your formula are probably right,
>> it has more to do with what is "success". For cnt decomposition, ISTM
>> that "other transactions" are really "directly successful
>> transactions".
>
> I agree with you, but I also think that skipped transactions should not be
> considered errors.

I'm ok with having a special category for them in the explanations, which
is neither success nor error.

> So we can write something like this:

> All the transactions are divided into several types depending on their
> execution. Firstly, they can be divided into transactions that we started to
> execute, and transactions which were skipped (it was too late to execute
> them). Secondly, running transactions fall into 2 main types: is there any
> command that got a failure during the last execution of the transaction
> script or not? Thus

Here is an attempt at having a more precise and shorter version, not sure
it is much better than yours, though:

"""
Transactions are counted depending on their execution and outcome. First
a transaction may have started or not: skipped transactions occur under
--rate and --latency-limit when the client is too late to execute them.
Secondly, a started transaction may ultimately succeed or fail on some
error, possibly after some retries when --max-tries is not one. Thus
"""

> the number of all transactions =
> skipped (it was too late to execute them)
> cnt (the number of successful transactions) +
> ecnt (the number of failed transactions).
>
> A successful transaction can have several unsuccessful tries before a
> successfull run. Thus
>
> cnt (the number of successful transactions) =
> retried (they got a serialization or a deadlock failure(s), but were
> successfully retried from the very beginning) +
> directly successfull transactions (they were successfully completed on
> the first try).

These above description is clearer for me.

>> I'd suggest to put "ANOTHER_SQL_FAILURE" as the last option, otherwise
>> "another" does not make sense yet.
>
> Maybe firstly put a general group, and then special cases?...

I understand it more as a catch all default "none of the above" case.

>> In TState, field "uint32 retries": maybe it would be simpler to count
>> "tries", which can be compared directly to max tries set in the option?
>
> If you mean retries in CState - on the one hand, yes, on the other hand,
> statistics always use the number of retries...

Ok.

>> The automaton skips to FAILURE on every possible error. I'm wondering
>> whether it could do so only on SQL errors, because other fails will
>> lead to ABORTED anyway? If there is no good reason to skip to FAILURE
>> from some errors, I'd suggest to keep the previous behavior. Maybe the
>> good reason is to do some counting, but this means that on eg
>> metacommand errors now the script would loop over instead of aborting,
>> which does not look like a desirable change of behavior.
>
> Even in the case of meta command errors we must prepare for CSTATE_END_TX and
> the execution of the next script: if necessary, clear the conditional stack
> and rollback the current transaction block.

Seems ok.

>> commandFailed: I'm not thrilled by the added boolean, which is partially
>> redundant with the second argument.
>
> Do you mean that it is partially redundant with the argument "cmd" and, for
> example, the meta commands errors always do not cause the abortions of the
> client?

Yes. And also I'm not sure we should want this boolean at all.

> [...]
> If in such cases one command is placed on several lines, ISTM that the code
> is more understandable if curly brackets are used...

Hmmm. Such basic style changes are avoided because they break
backpatching, so we try to avoid gratuitous changes unless there is a
strong added value, which does not seem to be the case here.

--
Fabien.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Paul Bonaud 2018-08-17 07:57:08 Re: Doc patch: pg_upgrade page and checkpoint location consistency with replicas
Previous Message Magnus Hagander 2018-08-17 07:48:26 Re: Problem with OpenSCG downloads