Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

From: Marina Polyakova <m(dot)polyakova(at)postgrespro(dot)ru>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org, Ildus Kurbangaliev <i(dot)kurbangaliev(at)postgrespro(dot)ru>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
Date: 2018-07-11 09:06:05
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 09-07-2018 16:05, Fabien COELHO wrote:
> Hello Marina,

Hello, Fabien!

> Here is a review for the last part of your v9 version.

Thank you very much for this!

> Patch does not "git apply" (may anymore):
> error: patch failed: doc/src/sgml/ref/pgbench.sgml:513
> error: doc/src/sgml/ref/pgbench.sgml: patch does not apply

Sorry, I'll send a new version soon.

> However I could get it to apply with the "patch" command.
> Then patch compiles, global & pgbench "make check" are ok.


> Feature
> =======
> The patch adds the ability to restart transactions (i.e. the full
> script)
> on some errors, which is a good thing as it allows to exercice postgres
> performance in more realistic scenarii.
> * -d/--debug: I'm not in favor in requiring a mandatory text argument
> on this
> option. It is not pratical, the user has to remember it, and it is a
> change.
> I'm sceptical of the overall debug handling changes. Maybe we could
> have
> multiple -d which lead to higher debug level, but I'm not sure that it
> can be
> made to work for this case and still be compatible with the previous
> behavior.
> Maybe you need a specific option for your purpose, eg "--debug-retry"?

As you wrote in [1], adding an additional option is also a bad idea:

> I'm sceptical of the "--debug-fails" options. ISTM that --debug is
> already there
> and should just be reused.

Maybe it's better to use an optional argument/arguments for
compatibility (--debug[=fails] or --debug[=NUM])? But if we use the
numbers, now I can see only 2 levels, and there's no guarantee that they
will no change..

> Code
> ====
> * The implementation is less complex that the previous submission,
> which is a good thing. I'm not sure that all the remaining complexity
> is still fully needed.
> * I'm reserved about the whole ereport thing, see comments in other
> messages.

Thank you, I'll try to implement the error reporting in the way you

> Leves ELEVEL_LOG_CLIENT_{FAIL,ABORTED} & LOG_MAIN look unclear to me.
> In particular, the "CLIENT" part is not very useful. If the
> distinction makes sense, I would have kept "LOG" for the initial one
> and
> add other ones for ABORT and PGBENCH, maybe.


> * There are no comments about "retries" in StatData, CState and Command
> structures.
> * Also, for StatData, I would like to understand the logic between cnt,
> skipped, retries, retried, errors, ... so a clear information about the
> expected invariant if any would be welcome. One has to go in the code
> to
> understand how these fields relate one to the other.
> <...>
> * How "errors" differs from "ecnt" is unclear to me.

Thank you, I'll fix this.

> * commandFailed: I think that it should be kept much simpler. In
> particular, having errors on errors does not help much: on
> ELEVEL_FATAL, it ignores the actual reported error and generates
> another error of the same level, so that the initial issue is hidden.
> Even if these are can't happen cases, hidding the origin if it occurs
> looks unhelpful. Just print it directly, and maybe abort if you think
> that it is a can't happen case.

Oh, thanks, my mistake(

> * copyRandomState: just use sizeof(RandomState) instead of making
> assumptions
> about the contents of the struct. Also, this function looks pretty
> useless,
> why not just do a plain assignment?
> * copyVariables: lacks comments to explain that the destination is
> cleaned up
> and so on. The cleanup phase could probaly be in a distinct function,
> so that
> the code would be clearer. Maybe the function variable names are too
> long.

Thank you, I'll fix this.

> if (current_source->svalue)
> in the context of a guard for a strdup, maybe:
> if (current_source->svalue != NULL)

I'm sorry, I'll fix this.

> * I do not understand the comments on CState enum: "First, remember the
> failure
> in CSTATE_FAILURE. Then process other commands of the failed
> transaction if any"
> Why would other commands be processed at all if the transaction is
> aborted?
> For me any error must leads to the rollback and possible retry of the
> transaction. This comment needs to be clarified. It should also say
> that on FAILURE, it will go either to RETRY or ABORTED. See below my
> comments about doCustom.
> It is unclear to me why their could be several failures within a
> transaction, as I would have stopped that it would be aborted on the
> first one.
> * I do not undestand the purpose of first_failure. The comment should
> explain
> why it would need to be remembered. From my point of view, I'm not
> fully
> convinced that it should.
> <...>
> * executeCondition: this hides client automaton state changes which
> were
> clearly visible beforehand in the switch, and the different handling of
> if & elif is also hidden.
> I'm against this unnecessary restructuring and to hide such an
> information,
> all state changes should be clearly seen in the state switch so that it
> is
> easier to understand and follow.
> I do not see why touching the conditional stack on internal errors
> (evaluateExpr failure) brings anything, the whole transaction will be
> aborted
> anyway.
> * doCustom changes.
> On CSTATE_START_COMMAND, it considers whether to retry on the end.
> For me, this cannot happen: if some command failed, then it should have
> skipped directly to the RETRY state, so that you cannot get to the end
> of the script with an error. Maybe you could assert that the state of
> the
> previous command is NO_FAILURE, though.
> On CSTATE_FAILURE, the next command is possibly started. Although there
> is some
> consistency with the previous point, I think that it totally breaks the
> state
> automaton where now a command can start while the whole transaction is
> in failing state anyway. There was no point in starting it in the first
> place.
> So, for me, the FAILURE state should record/count the failure, then
> skip
> to RETRY if a retry is decided, else proceed to ABORT. Nothing else.
> This is much clearer that way.
> Then RETRY should reinstate the global state and proceed to start the
> *first*
> command again.
> <...>
> It is unclear to me why backslash command errors are turned to FAILURE
> instead of ABORTED: there is no way they are going to be retried, so
> maybe they should/could skip directly to ABORTED?
> Function executeCondition is a bad idea, as stated above.

So do you propose to execute the command "ROLLBACK" without calculating
its latency etc. if we are in a failed transaction and clear the
conditional stack after each failure?

Also just to be clear: do you want to have the state CSTATE_ABORTED for
client abortion and another state for interrupting the current

> The current RETRY state does memory allocations to generate a message
> with buffer allocation and so on. This looks like a costly and useless
> operation. If the user required "retries", then this is normal
> behavior,
> the retries are counted and will be printed out in the final report,
> and there is no point in printing out every single one of them.
> Maybe you want that debugging, but then coslty operations should be
> guarded.

I think we need these debugging messages because, for example, if you
use the option --latency-limit, you we will never know in advance
whether the serialization/deadlock failure will be retried or not. They
also help to understand which limit of retries was violated or how close
we were to these limits during the execution of a specific transaction.
But I agree with you that they are costly and can be skipped if the
failure type is never retried. Maybe it is better to split them into
multiple error function calls?..

> * reporting
> The number of transactions above the latency limit report can be
> simplified.
> Remove the if and just use one printf f with a %s for the optional
> comment.
> I'm not sure this optional comment is useful there.

Oh, thanks, my mistake(

> Before the patch, ISTM that all lines relied on one printf. you have
> changed to a style where a collection of printf is used to compose a
> line. I'd suggest to keep to the previous one-printf-prints-one-line
> style, where possible.


> You have added 20-columns alignment prints. This looks like too much
> and
> generates much too large lines. Probably 10 (billion) would be enough.

I have already asked you about this in [2]:
> The variables for the numbers of failures and retries are of type int64
> since the variable for the total number of transactions has the same
> type. That's why such a large alignment (as I understand it now, enough
> 20 characters). Do you prefer floating alignemnts, depending on the
> maximum number of failures/retries for any command in any script?

> Some people try to parse the output, so it should be deterministic. I'd
> add
> the needed columns always if appropriate (i.e. under retry), even if
> none
> occured.


> * processXactStats: An else is replaced by a detailed stats, with the
> initial
> "no detailed stats" comment kept. The function is called both in the
> thenb
> & else branch. The structure does not make sense anymore. I'm not sure
> this changed was needed.
> * getLatencyUsed: declared "double" so "return 0.0".
> * typo: ruin -> run; probably others, I did not check for them in
> detail.

Oh, thanks, my mistakes(

> TAP Tests
> =========
> On my laptop, tests last 5.5 seconds before the patch, and about 13
> seconds
> after. This is much too large. Pgbench TAP tests do not deserve to take
> over
> twice as much time as before just on this patch.
> One reason which explains this large time is there is a new script
> with a new created instance. I'd suggest to append tests to the
> existing 2 scripts, depending on whether they need a running instance
> or not.

Ok! All new tests that do not need a running instance are already added
to the file

> Secondly, I think that the design of the tests are too heavy. For such
> a feature, ISTM enough to check that it works, i.e. one test for
> deadlocks (trigger one or a few deadlocks), idem for serializable,
> maybe idem for other errors if any.
> <...>
> The latency limit to 900 ms try is a bad idea because it takes a lot of
> time.
> I did such tests before and they were removed by Tom Lane because of
> determinism
> and time issues. I would comment this test out for now.

Ok! If it doesn't bother you - can you tell more about the causes of
these determinism issues?.. Tests for some other failures that cannot be
retried are already added to

> The challenge is to do that reliably and efficiently, i.e. so that the
> test does
> not rely on chance and is still quite efficient.
> The trick you use is to run an interactive psql in parallel to pgbench
> so as to
> play with concurrent locks. That is interesting, but deserves more
> comments
> and explanatation, eg before the test functions.
> Maybe this could be achieved within pgbench by using some wait stuff
> in PL/pgSQL so that concurrent client can wait one another based on
> data in unlogged table updated by a CALL within an "embedded"
> transactions? Not sure.
> <...>
> Anyway, TAP tests should be much lighter (in total time), and if
> possible much simpler.

I'll try, thank you..

> Otherwise, maybe (simple) pgbench-side thread
> barrier could help, but this would require more thinking.

Tests must pass if we use --disable-thread-safety..

> Documentation
> =============
> Not looked at in much details for now. Just a few comments:
> Having the "most important settings" on line 1-6 and 8 (i.e. skipping
> 7) looks
> silly. The important ones should simply be the first ones, and the 8th
> is not
> that important, or it is in 7th position.


> I do not understand why there is so much text about in failed sql
> transaction
> stuff, while we are mainly interested in serialization & deadlock
> errors, and
> this only falls in some "other" category. There seems to be more
> details about
> other errors that about deadlocks & serializable errors.
> The reporting should focus on what is of interest, either all errors,
> or some
> detailed split of these errors.
> <...>
> * "errors_in_failed_tx" is some subcounter of "errors", for a special
> case. Why it is there fails me [I finally understood, and I think it
> should be removed, see end of review]. If we wanted to distinguish,
> then we should distinguish homogeneously: maybe just count the
> different error types, eg have things like "deadlock_errors",
> "serializable_errors", "other_errors", "internal_pgbench_errors" which
> would be orthogonal one to the other, and "errors" could be recomputed
> from these.

Thank you, I agree with you. Unfortunately each new error type adds a
new 1 or 2 columns of maximum width 20 to the per-statement report (to
report errors and possibly retries of this type in this statement) and
we already have 2 new columns for all errors and retries. So I'm not
sure that we need add anything other than statistics only about all the
errors and all the retries in general.

> The documentation should state clearly what
> are the counted errors, and then what are their effects on the reported
> stats.
> The "Errors and Serialization/Deadlock Retries" section is a good start
> in that
> direction, but it does not talk about pgbench internal errors (eg
> "cos(true)").
> I think it should more explicit about errors.

Thank you, I'll try to improve it.

> Option --max-tries default value should be spelled out in the doc.

If you mean that it is set to 1 if neither of the options --max-tries or
--latency-limit is explicitly used, I'll fix this.

> "Client's run is aborted", do you mean "Pgbench run is aborted"?

No, other clients continue their run as usual.

> * FailureStatus states are not homogeneously named. I'd suggest to use
> *_FAILURE for all cases. The miscellaneous case should probably be the
> last. I do not understand the distinction between ANOTHER_FAILURE &
> IN_FAILED_SQL_TRANSACTION. Why should it be needed? [again, see and of
> review]
> <...>
> "If a failed transaction block does not terminate in the current
> script":
> this just looks like a very bad idea, and explains my general ranting
> above about this error condition. ISTM that the only reasonable option
> is that a pgbench script should be inforced as a transaction, or a set
> of
> transactions, but cannot be a "piece" of transaction, i.e. pgbench
> script
> with "BEGIN;" but without a corresponding "COMMIT" is a user error and
> warrants an abort, so that there is no need to manage these "in aborted
> transaction" errors every where and report about them and document them
> extensively.
> This means adding a check when a script is finished or starting that
> PQtransactionStatus(const PGconn *conn) == PQTRANS_IDLE, and abort if
> not
> with a fatal error. Then we can forget about these "in tx errors"
> counting,
> reporting and so on, and just have to document the restriction.



Marina Polyakova
Postgres Professional:
The Russian Postgres Company

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2018-07-11 09:27:33 Negotiating the SCRAM channel binding type
Previous Message Yugo Nagata 2018-07-11 09:04:08 Allow to specify a index name as ANALYZE parameter