Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Marina Polyakova <m(dot)polyakova(at)postgrespro(dot)ru>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org, Ildus Kurbangaliev <i(dot)kurbangaliev(at)postgrespro(dot)ru>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
Date: 2018-07-09 13:05:09
Message-ID: alpine.DEB.2.21.1807091451520.17811@lancre
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello Marina,

> v9-0004-Pgbench-errors-and-serialization-deadlock-retries.patch
> - the main patch for handling client errors and repetition of transactions
> with serialization/deadlock failures (see the detailed description in the
> file).

Here is a review for the last part of your v9 version.

Patch does not "git apply" (may anymore):
error: patch failed: doc/src/sgml/ref/pgbench.sgml:513
error: doc/src/sgml/ref/pgbench.sgml: patch does not apply

However I could get it to apply with the "patch" command.

Then patch compiles, global & pgbench "make check" are ok.


The patch adds the ability to restart transactions (i.e. the full script)
on some errors, which is a good thing as it allows to exercice postgres
performance in more realistic scenarii.

* -d/--debug: I'm not in favor in requiring a mandatory text argument on this
option. It is not pratical, the user has to remember it, and it is a change.
I'm sceptical of the overall debug handling changes. Maybe we could have
multiple -d which lead to higher debug level, but I'm not sure that it can be
made to work for this case and still be compatible with the previous behavior.
Maybe you need a specific option for your purpose, eg "--debug-retry"?


* The implementation is less complex that the previous submission, which
is a good thing. I'm not sure that all the remaining complexity is still
fully needed.

* I'm reserved about the whole ereport thing, see comments in other

Leves ELEVEL_LOG_CLIENT_{FAIL,ABORTED} & LOG_MAIN look unclear to me.
In particular, the "CLIENT" part is not very useful. If the
distinction makes sense, I would have kept "LOG" for the initial one and
add other ones for ABORT and PGBENCH, maybe.

* There are no comments about "retries" in StatData, CState and Command

* Also, for StatData, I would like to understand the logic between cnt,
skipped, retries, retried, errors, ... so a clear information about the
expected invariant if any would be welcome. One has to go in the code to
understand how these fields relate one to the other.

* "errors_in_failed_tx" is some subcounter of "errors", for a special
case. Why it is there fails me [I finally understood, and I think it
should be removed, see end of review]. If we wanted to distinguish, then
we should distinguish homogeneously: maybe just count the different error
types, eg have things like "deadlock_errors", "serializable_errors",
"other_errors", "internal_pgbench_errors" which would be orthogonal one to
the other, and "errors" could be recomputed from these.

* How "errors" differs from "ecnt" is unclear to me.

* FailureStatus states are not homogeneously named. I'd suggest to use
*_FAILURE for all cases. The miscellaneous case should probably be the
last. I do not understand the distinction between ANOTHER_FAILURE &
IN_FAILED_SQL_TRANSACTION. Why should it be needed? [again, see and of

* I do not understand the comments on CState enum: "First, remember the failure
in CSTATE_FAILURE. Then process other commands of the failed transaction if any"
Why would other commands be processed at all if the transaction is aborted?
For me any error must leads to the rollback and possible retry of the
transaction. This comment needs to be clarified. It should also say
that on FAILURE, it will go either to RETRY or ABORTED. See below my
comments about doCustom.

It is unclear to me why their could be several failures within a
transaction, as I would have stopped that it would be aborted on the first

* I do not undestand the purpose of first_failure. The comment should explain
why it would need to be remembered. From my point of view, I'm not fully
convinced that it should.

* commandFailed: I think that it should be kept much simpler. In
particular, having errors on errors does not help much: on ELEVEL_FATAL,
it ignores the actual reported error and generates another error of the
same level, so that the initial issue is hidden. Even if these are can't
happen cases, hidding the origin if it occurs looks unhelpful. Just print
it directly, and maybe abort if you think that it is a can't happen case.

* copyRandomState: just use sizeof(RandomState) instead of making assumptions
about the contents of the struct. Also, this function looks pretty useless,
why not just do a plain assignment?

* copyVariables: lacks comments to explain that the destination is cleaned up
and so on. The cleanup phase could probaly be in a distinct function, so that
the code would be clearer. Maybe the function variable names are too long.

if (current_source->svalue)

in the context of a guard for a strdup, maybe:

if (current_source->svalue != NULL)

* executeCondition: this hides client automaton state changes which were
clearly visible beforehand in the switch, and the different handling of
if & elif is also hidden.

I'm against this unnecessary restructuring and to hide such an information,
all state changes should be clearly seen in the state switch so that it is
easier to understand and follow.

I do not see why touching the conditional stack on internal errors
(evaluateExpr failure) brings anything, the whole transaction will be aborted

* doCustom changes.

On CSTATE_START_COMMAND, it considers whether to retry on the end.
For me, this cannot happen: if some command failed, then it should have
skipped directly to the RETRY state, so that you cannot get to the end
of the script with an error. Maybe you could assert that the state of the
previous command is NO_FAILURE, though.

On CSTATE_FAILURE, the next command is possibly started. Although there is some
consistency with the previous point, I think that it totally breaks the state
automaton where now a command can start while the whole transaction is
in failing state anyway. There was no point in starting it in the first

So, for me, the FAILURE state should record/count the failure, then skip
to RETRY if a retry is decided, else proceed to ABORT. Nothing else.
This is much clearer that way.

Then RETRY should reinstate the global state and proceed to start the *first*
command again.

The current RETRY state does memory allocations to generate a message
with buffer allocation and so on. This looks like a costly and useless
operation. If the user required "retries", then this is normal behavior,
the retries are counted and will be printed out in the final report,
and there is no point in printing out every single one of them.
Maybe you want that debugging, but then coslty operations should be guarded.

It is unclear to me why backslash command errors are turned to FAILURE
instead of ABORTED: there is no way they are going to be retried, so
maybe they should/could skip directly to ABORTED?

Function executeCondition is a bad idea, as stated above.

* reporting

The number of transactions above the latency limit report can be simplified.
Remove the if and just use one printf f with a %s for the optional comment.
I'm not sure this optional comment is useful there.

Before the patch, ISTM that all lines relied on one printf. you have
changed to a style where a collection of printf is used to compose a line.
I'd suggest to keep to the previous one-printf-prints-one-line style,
where possible.

You have added 20-columns alignment prints. This looks like too much and
generates much too large lines. Probably 10 (billion) would be enough.

Some people try to parse the output, so it should be deterministic. I'd add
the needed columns always if appropriate (i.e. under retry), even if none

* processXactStats: An else is replaced by a detailed stats, with the initial
"no detailed stats" comment kept. The function is called both in the thenb
& else branch. The structure does not make sense anymore. I'm not sure
this changed was needed.

* getLatencyUsed: declared "double" so "return 0.0".

* typo: ruin -> run; probably others, I did not check for them in detail.

TAP Tests

On my laptop, tests last 5.5 seconds before the patch, and about 13 seconds
after. This is much too large. Pgbench TAP tests do not deserve to take over
twice as much time as before just on this patch.

One reason which explains this large time is there is a new script with a
new created instance. I'd suggest to append tests to the existing 2
scripts, depending on whether they need a running instance or not.

Secondly, I think that the design of the tests are too heavy. For such a
feature, ISTM enough to check that it works, i.e. one test for deadlocks
(trigger one or a few deadlocks), idem for serializable, maybe idem for
other errors if any.

The challenge is to do that reliably and efficiently, i.e. so that the test does
not rely on chance and is still quite efficient.

The trick you use is to run an interactive psql in parallel to pgbench so as to
play with concurrent locks. That is interesting, but deserves more comments
and explanatation, eg before the test functions.

Maybe this could be achieved within pgbench by using some wait stuff in
PL/pgSQL so that concurrent client can wait one another based on data in
unlogged table updated by a CALL within an "embedded" transactions? Not
sure. Otherwise, maybe (simple) pgbench-side thread barrier could help,
but this would require more thinking.

Anyway, TAP tests should be much lighter (in total time), and if possible
much simpler.

The latency limit to 900 ms try is a bad idea because it takes a lot of time.
I did such tests before and they were removed by Tom Lane because of determinism
and time issues. I would comment this test out for now.


Not looked at in much details for now. Just a few comments:

Having the "most important settings" on line 1-6 and 8 (i.e. skipping 7) looks
silly. The important ones should simply be the first ones, and the 8th is not
that important, or it is in 7th position.

I do not understand why there is so much text about in failed sql transaction
stuff, while we are mainly interested in serialization & deadlock errors, and
this only falls in some "other" category. There seems to be more details about
other errors that about deadlocks & serializable errors.

The reporting should focus on what is of interest, either all errors, or some
detailed split of these errors. The documentation should state clearly what
are the counted errors, and then what are their effects on the reported stats.
The "Errors and Serialization/Deadlock Retries" section is a good start in that
direction, but it does not talk about pgbench internal errors (eg "cos(true)").
I think it should more explicit about errors.

Option --max-tries default value should be spelled out in the doc.

"Client's run is aborted", do you mean "Pgbench run is aborted"?

"If a failed transaction block does not terminate in the current script":
this just looks like a very bad idea, and explains my general ranting
above about this error condition. ISTM that the only reasonable option
is that a pgbench script should be inforced as a transaction, or a set of
transactions, but cannot be a "piece" of transaction, i.e. pgbench script
with "BEGIN;" but without a corresponding "COMMIT" is a user error and
warrants an abort, so that there is no need to manage these "in aborted
transaction" errors every where and report about them and document them

This means adding a check when a script is finished or starting that
PQtransactionStatus(const PGconn *conn) == PQTRANS_IDLE, and abort if not
with a fatal error. Then we can forget about these "in tx errors" counting,
reporting and so on, and just have to document the restriction.


In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2018-07-09 13:12:36 Re: [HACKERS] PoC: full merge join on comparison clause
Previous Message David Fetter 2018-07-09 13:01:05 Re: How to set array element to null value