Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

From: Marina Polyakova <m(dot)polyakova(at)postgrespro(dot)ru>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Ildus Kurbangaliev <i(dot)kurbangaliev(at)postgrespro(dot)ru>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
Date: 2018-08-07 10:56:50
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello, hackers!

Here there's a tenth version of the patch for error handling and
retrying of transactions with serialization/deadlock failures in pgbench
(based on the commit e0ee93053998b159e395deed7c42e02b1f921552) thanks to
the comments of Fabien Coelho and Alvaro Herrera in this thread.

- a patch for the RandomState structure (this is used to reset a
client's random seed during the repeating of transactions after
serialization/deadlock failures).

- a patch for a separate error reporting function (this is used to
report client failures that do not cause an aborts and this depends on
the level of debugging).

- a patch for the Variables structure (this is used to reset client
variables during the repeating of transactions after
serialization/deadlock failures).

- the main patch for handling client errors and repetition of
transactions with serialization/deadlock failures (see the detailed
description in the file).

As Fabien wrote in [5], some of the new tests were too slow. Earlier on
my laptop they increased the testing time of pgbench from 5.5 seconds to
12.5 seconds. In the new version the testing time of pgbench takes about
7 seconds. These tests include one test for serialization failure and
retry, as well as one test for deadlock failure and retry. Both of them
are in file, each test uses only one pgbench
run, they use PL/pgSQL scripts instead of a parallel psql session.

Any suggestions are welcome!

All that was fixed from the previous version:


> ISTM that the struct itself does not need a name, ie. "typedef struct {
> ... } RandomState" is enough.

> There could be clear comments, say in the TState and CState structs,
> about
> what randomness is impacted (i.e. script choices, etc.).

> getZipfianRand, computeHarmonicZipfian: The "thread" parameter was
> justified because it was used for two fieds. As the random state is
> separated, I'd suggest that the other argument should be a zipfcache
> pointer.

> While reading your patch, it occurs to me that a run is not
> deterministic
> at the thread level under throttling and sampling, because the random
> state is sollicited differently depending on when transaction ends.
> This
> suggest that maybe each thread random_state use should have its own
> random
> state.


> The structure typedef does not need a name. "typedef struct { } V...".

> I tend to disagree with naming things after their type, eg "array". I'd
> suggest "vars" instead. "nvariables" could be "nvars" for consistency
> with
> that and "vars_sorted", and because "foo.variables->nvariables" starts
> looking heavy.

> I'd suggest but "Variables" type declaration just after "Variable" type
> declaration in the file.


> The semantics of the existing code is changed, the FATAL levels calls
> abort() and replace existing exit(1) calls. Maybe you want an ERROR
> level as well.

> I do not understand why names are changed, eg ELEVEL_FATAL instead of
> FATAL. ISTM that part of the point of the move would be to be
> homogeneous,
> which suggests that the same names should be reused.


> I'd suggest to have just one clean and simple pgbench internal function
> to
> handle errors and possibly exit, debug... Something like
> void pgb_error(FATAL, "error %d raised", 12);
> Implemented as
> void pgb_error(int/enum XXX level, const char * format, ...)
> {
> test level and maybe return immediately (eg debug);
> print to stderr;
> exit/abort/return depending;
> }


> Leves ELEVEL_LOG_CLIENT_{FAIL,ABORTED} & LOG_MAIN look unclear to me.
> In particular, the "CLIENT" part is not very useful. If the
> distinction makes sense, I would have kept "LOG" for the initial one
> and
> add other ones for ABORT and PGBENCH, maybe.

> * There are no comments about "retries" in StatData, CState and Command
> structures.

> * Also, for StatData, I would like to understand the logic between cnt,
> skipped, retries, retried, errors, ... so a clear information about the
> expected invariant if any would be welcome. One has to go in the code
> to
> understand how these fields relate one to the other.

> * "errors_in_failed_tx" is some subcounter of "errors", for a special
> case. Why it is there fails me [I finally understood, and I think it
> should be removed, see end of review]. If we wanted to distinguish,
> then
> we should distinguish homogeneously: maybe just count the different
> error
> types, eg have things like "deadlock_errors", "serializable_errors",
> "other_errors", "internal_pgbench_errors" which would be orthogonal one
> to
> the other, and "errors" could be recomputed from these.

> * How "errors" differs from "ecnt" is unclear to me.

> * FailureStatus states are not homogeneously named. I'd suggest to use
> *_FAILURE for all cases. The miscellaneous case should probably be the
> last.

> * I do not understand the comments on CState enum: "First, remember the
> failure
> in CSTATE_FAILURE. Then process other commands of the failed
> transaction if any"
> Why would other commands be processed at all if the transaction is
> aborted?
> For me any error must leads to the rollback and possible retry of the
> transaction.
> ...
> So, for me, the FAILURE state should record/count the failure, then
> skip
> to RETRY if a retry is decided, else proceed to ABORT. Nothing else.
> This is much clearer that way.
> Then RETRY should reinstate the global state and proceed to start the
> *first*
> command again.

> * commandFailed: I think that it should be kept much simpler. In
> particular, having errors on errors does not help much: on
> it ignores the actual reported error and generates another error of the
> same level, so that the initial issue is hidden. Even if these are
> can't
> happen cases, hidding the origin if it occurs looks unhelpful. Just
> print
> it directly, and maybe abort if you think that it is a can't happen
> case.

> * copyRandomState: just use sizeof(RandomState) instead of making
> assumptions
> about the contents of the struct. Also, this function looks pretty
> useless,
> why not just do a plain assignment?

> * copyVariables: lacks comments to explain that the destination is
> cleaned up
> and so on. The cleanup phase could probaly be in a distinct function,
> so that
> the code would be clearer. Maybe the function variable names are too
> long.
> if (current_source->svalue)
> in the context of a guard for a strdup, maybe:
> if (current_source->svalue != NULL)

> * executeCondition: this hides client automaton state changes which
> were
> clearly visible beforehand in the switch, and the different handling of
> if & elif is also hidden.
> I'm against this unnecessary restructuring and to hide such an
> information,
> all state changes should be clearly seen in the state switch so that it
> is
> easier to understand and follow.
> I do not see why touching the conditional stack on internal errors
> (evaluateExpr failure) brings anything, the whole transaction will be
> aborted
> anyway.

> The current RETRY state does memory allocations to generate a message
> with buffer allocation and so on. This looks like a costly and useless
> operation. If the user required "retries", then this is normal
> behavior,
> the retries are counted and will be printed out in the final report,
> and there is no point in printing out every single one of them.
> Maybe you want that debugging, but then coslty operations should be
> guarded.

> The number of transactions above the latency limit report can be
> simplified.
> Remove the if and just use one printf f with a %s for the optional
> comment.
> I'm not sure this optional comment is useful there.

> Before the patch, ISTM that all lines relied on one printf. you have
> changed to a style where a collection of printf is used to compose a
> line.
> I'd suggest to keep to the previous one-printf-prints-one-line style,
> where possible.

> You have added 20-columns alignment prints. This looks like too much
> and
> generates much too large lines. Probably 10 (billion) would be enough.
> Some people try to parse the output, so it should be deterministic. I'd
> add
> the needed columns always if appropriate (i.e. under retry), even if
> none
> occured.

> * processXactStats: An else is replaced by a detailed stats, with the
> initial
> "no detailed stats" comment kept. The function is called both in the
> thenb
> & else branch. The structure does not make sense anymore. I'm not sure
> this changed was needed.

> * getLatencyUsed: declared "double" so "return 0.0".

> * typo: ruin -> run; probably others, I did not check for them in
> detail.

> On my laptop, tests last 5.5 seconds before the patch, and about 13
> seconds
> after. This is much too large. Pgbench TAP tests do not deserve to take
> over
> twice as much time as before just on this patch.
> One reason which explains this large time is there is a new script with
> a
> new created instance. I'd suggest to append tests to the existing 2
> scripts, depending on whether they need a running instance or not.
> Secondly, I think that the design of the tests are too heavy. For such
> a
> feature, ISTM enough to check that it works, i.e. one test for
> deadlocks
> (trigger one or a few deadlocks), idem for serializable, maybe idem for
> other errors if any.
> The challenge is to do that reliably and efficiently, i.e. so that the
> test does
> not rely on chance and is still quite efficient.
> The trick you use is to run an interactive psql in parallel to pgbench
> so as to
> play with concurrent locks. That is interesting, but deserves more
> comments
> and explanatation, eg before the test functions.
> Maybe this could be achieved within pgbench by using some wait stuff in
> PL/pgSQL so that concurrent client can wait one another based on data
> in
> unlogged table updated by a CALL within an "embedded" transactions? Not
> sure. ...
> Anyway, TAP tests should be much lighter (in total time), and if
> possible
> much simpler.
> The latency limit to 900 ms try is a bad idea because it takes a lot of
> time.
> I did such tests before and they were removed by Tom Lane because of
> determinism
> and time issues. I would comment this test out for now.

> Documentation
> ...
> Having the "most important settings" on line 1-6 and 8 (i.e. skipping
> 7) looks
> silly. The important ones should simply be the first ones, and the 8th
> is not
> that important, or it is in 7th position.
> I do not understand why there is so much text about in failed sql
> transaction
> stuff, while we are mainly interested in serialization & deadlock
> errors, and
> this only falls in some "other" category. There seems to be more
> details about
> other errors that about deadlocks & serializable errors.
> The reporting should focus on what is of interest, either all errors,
> or some
> detailed split of these errors. The documentation should state clearly
> what
> are the counted errors, and then what are their effects on the reported
> stats.
> The "Errors and Serialization/Deadlock Retries" section is a good start
> in that
> direction, but it does not talk about pgbench internal errors (eg
> "cos(true)").
> I think it should more explicit about errors.
> Option --max-tries default value should be spelled out in the doc.


> So if absolutely necessary, a new option is still better than changing
> --debug syntax. If not necessary, then it is better:-)

> The fact that some data are collected does not mean that they should
> all
> be reported in detail. We can have detailed error count and report the
> sum
> of this errors for instance, or have some more verbose/detailed reports
> as options (eg --latencies does just that).


> LGTM, though I'd rename the random_state struct members so that it
> wouldn't look as confusing. Maybe that's just me.

> Please don't allocate Variable structs one by one. First time allocate
> some decent number (say 8) and then enlarge by duplicating size. That
> way you save realloc overhead. We use this technique everywhere else,
> no reason do different here. Other than that, LGTM.


> If you really need some complex dynamic buffer, and I would prefer
> that you avoid that, then the fallback is:
> if (level >= DEBUG)
> {
> initPQstuff(&msg);
> ...
> pgbench_error(DEBUG, "fixed message... %s\n", msg);
> freePQstuff(&msg);
> }
> The point is to avoid building the message with dynamic allocation and
> so
> if in the end it is not used.

Marina Polyakova
Postgres Professional:
The Russian Postgres Company

Attachment Content-Type Size
v10-0001-Pgbench-errors-use-the-RandomState-structure-for.patch text/x-diff 10.8 KB
v10-0002-Pgbench-errors-use-a-separate-function-to-report.patch text/x-diff 57.7 KB
v10-0003-Pgbench-errors-use-the-Variables-structure-for-c.patch text/x-diff 17.1 KB
v10-0004-Pgbench-errors-and-serialization-deadlock-retrie.patch text/x-diff 102.1 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2018-08-07 11:40:43 Re: Negotiating the SCRAM channel binding type
Previous Message Yamaji, Ryo 2018-08-07 10:02:20 RE: [HACKERS] Cached plans and statement generalization