Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

From: Marina Polyakova <m(dot)polyakova(at)postgrespro(dot)ru>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
Date: 2018-03-27 14:49:02
Message-ID: 9c5bf620e050e0803d2a8828a9e9bdb4@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

On 26-03-2018 18:53, Fabien COELHO wrote:
> Hello Marina,

Hello!

>> Many thanks to both of you! I'm working on a patch in this direction..
>>
>>> I think that the best approach for now is simply to reset (command
>>> zero, random generator) and start over the whole script, without
>>> attempting to be more intelligent. The limitations should be clearly
>>> documented (one transaction per script), though. That would be a
>>> significant enhancement already.
>>
>> I'm not sure that we can always do this, because we can get new errors
>> until we finish the failed transaction block, and we need destroy the
>> conditional stack..
>
> Sure. I'm suggesting so as to simplify that on failures the retry
> would always restarts from the beginning of the script by resetting
> everything, indeed including the conditional stack, the random
> generator state, the variable values, and so on.
>
> This mean enforcing somehow one script is one transaction.
>
> If the user does not do that, it would be their decision and the
> result becomes unpredictable on errors (eg some sub-transactions could
> be executed more than once).
>
> Then if more is needed, that could be for another patch.

Here is the fifth version of the patch for pgbench (based on the commit
4b9094eb6e14dfdbed61278ea8e51cc846e43579) where I tried to implement
these ideas, thanks to your comments and those of Teodor Sigaev. Since
we may need to execute commands to complete a failed transaction block,
the script is now always executed completely. If there is a
serialization/deadlock failure which can be retried, the script is
executed again with the same random state and array of variables as
before its first run. Meta commands errors as well as all SQL errors do
not cause the aborting of the client. The first failure in the current
script execution determines whether the script run will be retried or
not, so only such failures (they have a retry) or errors (they are not
retried) are reported.

I tried to make fixes in accordance with your previous reviews ([1],
[2], [3]):

> I'm unclear about the added example added in the documentation. There
> are 71% errors, but 100% of transactions are reported as processed. If
> there were errors, then it is not a success, so the transaction were
> not
> processed? To me it looks inconsistent. Also, while testing, it seems
> that
> failed transactions are counted in tps, which I think is not
> appropriate:
>
>
> About the feature:
>
> sh> PGOPTIONS='-c default_transaction_isolation=serializable' \
> ./pgbench -P 1 -T 3 -r -M prepared -j 2 -c 4
> starting vacuum...end.
> progress: 1.0 s, 10845.8 tps, lat 0.091 ms stddev 0.491, 10474 failed
> # NOT 10845.8 TPS...
> progress: 2.0 s, 10534.6 tps, lat 0.094 ms stddev 0.658, 10203 failed
> progress: 3.0 s, 10643.4 tps, lat 0.096 ms stddev 0.568, 10290 failed
> ...
> number of transactions actually processed: 32028 # NO!
> number of errors: 30969 (96.694 %)
> latency average = 2.833 ms
> latency stddev = 1.508 ms
> tps = 10666.720870 (including connections establishing) # NO
> tps = 10683.034369 (excluding connections establishing) # NO
> ...
>
> For me this is all wrong. I think that the tps report is about
> transactions
> that succeeded, not mere attempts. I cannot say that a transaction
> which aborted
> was "actually processed"... as it was not.

Fixed

> The order of reported elements is not logical:
>
> maximum number of transaction tries: 100
> scaling factor: 10
> query mode: prepared
> number of clients: 4
> number of threads: 2
> duration: 3 s
> number of transactions actually processed: 967
> number of errors: 152 (15.719 %)
> latency average = 9.630 ms
> latency stddev = 13.366 ms
> number of transactions retried: 623 (64.426 %)
> number of retries: 32272
>
> I would suggest to group everything about error handling in one block,
> eg something like:
>
> scaling factor: 10
> query mode: prepared
> number of clients: 4
> number of threads: 2
> duration: 3 s
> number of transactions actually processed: 967
> number of errors: 152 (15.719 %)
> number of transactions retried: 623 (64.426 %)
> number of retries: 32272
> maximum number of transaction tries: 100
> latency average = 9.630 ms
> latency stddev = 13.366 ms

Fixed

> Also, percent character should be stuck to its number: 15.719% to have
> the style more homogeneous (although there seems to be pre-existing
> inhomogeneities).
>
> I would replace "transaction tries/retried" by "tries/retried",
> everything
> is about transactions in the report anyway.
>
> Without reading the documentation, the overall report semantics is
> unclear,
> especially given the absurd tps results I got with the my first
> attempt,
> as failing transactions are counted as "processed".

Fixed

> About the code:
>
> I'm at lost with the 7 states added to the automaton, where I would
> have hoped
> that only 2 (eg RETRY & FAIL, or even less) would be enough.

Fixed

> I'm wondering whether the whole feature could be simplified by
> considering that one script is one "transaction" (it is from the
> report point of view at least), and that any retry is for the full
> script only, from its beginning. That would remove the trying to guess
> at transactions begin or end, avoid scanning manually for subcommands,
> and so on.
> - Would it make sense?
> - Would it be ok for your use case?

Fixed

> The proposed version of the code looks unmaintainable to me. There are
> 3 levels of nested "switch/case" with state changes at the deepest
> level.
> I cannot even see it on my screen which is not wide enough.

Fixed

> There should be a typedef for "random_state", eg something like:
>
> typedef struct { unsigned short data[3]; } RandomState;
>
> Please keep "const" declarations, eg "commandFailed".
>
> I think that choosing script should depend on the thread random state,
> not
> the client random state, so that a run would generate the same pattern
> per
> thread, independently of which client finishes first.
>
> I'm sceptical of the "--debug-fails" options. ISTM that --debug is
> already there
> and should just be reused.

Fixed

> I agree that function naming style is a already a mess, but I think
> that
> new functions you add should use a common style, eg "is_compound" vs
> "canRetry".

Fixed

> Translating error strings to their enum should be put in a function.

Removed

> I'm not sure this whole thing should be done anyway.

The processing of compound commands is removed.

> The "node" is started but never stopped.

Fixed

> For file contents, maybe the << 'EOF' here-document syntax would help
> instead
> of using concatenated backslashed strings everywhere.

I'm sorry, but I could not get it to work with regular expressions :(

> I'd start by stating (i.e. documenting) that the features assumes that
> one
> script is just *one* transaction.
>
> Note that pgbench somehow already assumes that one script is one
> transaction when it reports performance anyway.
>
> If you want 2 transactions, then you have to put them in two scripts,
> which looks fine with me. Different transactions are expected to be
> independent, otherwise they should be merged into one transaction.

Fixed

> Under these restrictions, ISTM that a retry is something like:
>
> case ABORTED:
> if (we want to retry) {
> // do necessary stats
> // reset the initial state (random, vars, current command)
> state = START_TX; // loop
> }
> else {
> // count as failed...
> state = FINISHED; // or done.
> }
> break;
...
> I'm fine with having END_COMMAND skipping to START_TX if it can be done
> easily and cleanly, esp without code duplication.

I did not want to add the additional if-expressions possibly to most of
the code in CSTATE_START_TX/CSTATE_END_TX/CSTATE_END_COMMAND, so
CSTATE_FAILURE is used instead of CSTATE_END_COMMAND in case of failure,
and CSTATE_RETRY is called before CSTATE_END_TX if there was a failure
during the current script execution.

> ISTM that ABORTED & FINISHED are currently exactly the same. That would
> put a particular use to aborted. Also, there are many points where the
> code may go to "aborted" state, so reusing it could help avoid
> duplicating
> stuff on each abort decision.

To end and rollback the failed transaction block the script is always
executed completely, and after the failure the following script command
is executed..

[1]
https://www.postgresql.org/message-id/alpine.DEB.2.20.1801031720270.20034%40lancre
[2]
https://www.postgresql.org/message-id/alpine.DEB.2.20.1801121309300.10810%40lancre
[3]
https://www.postgresql.org/message-id/alpine.DEB.2.20.1801121607310.13422%40lancre

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
v5-0001-Pgbench-errors-and-serialization-deadlock-retries.patch text/x-diff 122.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavan Deolasee 2018-03-27 14:51:35 Re: Changing WAL Header to reduce contention during ReserveXLogInsertLocation()
Previous Message Simon Riggs 2018-03-27 14:48:48 Re: Changing WAL Header to reduce contention during ReserveXLogInsertLocation()