Quick Links

Re: WIP Patch: Pgbench Serialization and deadlock errors

From:	Marina Polyakova <m(dot)polyakova(at)postgrespro(dot)ru>
To:	Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc:	PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>, Kevin Grittner <kgrittn(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject:	Re: WIP Patch: Pgbench Serialization and deadlock errors
Date:	2018-01-11 12:42:22
Message-ID:	fad8aae42373a9b58b74b0af5c04a6cb@postgrespro.ru
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 03-01-2018 19:28, Fabien COELHO wrote:
> Hello Marina,

Hello!

> Some comment about WIP pgbench error handling v4.

Many thanks!

> Patch applies, compiles, global & local "make check" are ok. doc
> compiles.
>
> I'm generally okay with having such a feature, but I'd like it to be
> *MUCH* simpler, otherwise it is going to make pgbench
> unmaintainable:-(

I understand your concern about this, but what exactly do you want to
simplify (except for compound commands, the option --debug-fails and
processing the end of the failed transaction block, because you discuss
them later)?

> Also, ISTM that a new patch should address somehow (in the code or
> with explanation about why you disagree) all comments from the
> previous review, which is not really the case here, as I have to
> repeat some of the comments I did on v3. You should answer to the
> previous comment mail and tag all comments with "ok", "no because this
> or that", "done", "postponed because this or that...".

1) You can read my answer to your previous review [1]:
- if there's a short answer like "Ok!", "Thanks!", "Sorry, thanks",
"Thanks, I agree" or "Thanks, I'll fix it", it means "done";
- if there's an open question, it means that it is postponed because I
don't know exactly how to do it.

2) The second note of the --max-tries documentation. I wrote you a
suggestion in [1] and included it in the patch because you did not
answer me about it:

>> I do not understand the second note of the --max-tries documentation.
>> It seems to suggest that some script may not end their own
>> transaction...
>> which should be an error in my opinion? Some explanations would be
>> welcome.
>
> As you told me here [1], "I disagree about exit in ParseScript if the
> transaction block is not completed <...> and would break an existing
> feature.". Maybe it's be better to say this:
>
> In pgbench you can use scripts in which the transaction blocks do not
> end. Be careful in this case because transactions that span over more
> than one script are not rolled back and will not be retried in case of
> an error. In such cases, the script in which the error occurred is
> reported as failed.
>
> ?

3) About the tests. I wrote to you why they are so heavy in [1]:

>> The tap tests seems over-complicated and heavy with two pgbench run in
>> parallel... I'm not sure we really want all that complexity for this
>> somehow small feature. Moreover pgbench can run several scripts, I'm
>> not
>> sure why two pgbench would need to be invoked. Could something much
>> simpler and lighter be proposed instead to test the feature?
>
> Firstly, two pgbench need to be invoked because we don't know which of
> them will get a deadlock failure. Secondly, I tried much simplier tests
> but all of them failed sometimes although everything was ok:
> - tests in which pgbench runs 5 clients and 10 transactions per client
> for a serialization/deadlock failure on any client (sometimes there are
> no failures when it is expected that they will be)
> - tests in which pgbench runs 30 clients and 400 transactions per
> client
> for a serialization/deadlock failure on any client (sometimes there are
> no failures when it is expected that they will be)
> - tests in which the psql session starts concurrently and you use sleep
> commands to wait pgbench for 10 seconds (sometimes it does not work)
> Only advisory locks help me not to get such errors in the tests :(

+ as I wrote in [2], they became lighter (28 simple tests instead of 57,
8 runs of pgbench instead of 17).

> About the documentation:
>
> Again, as already said in the previous review, please take into account
> comments
> or justify why you do not do so, I do not think that this feature
> should be
> given any pre-emminence: most of the time performance testing is about
> all-is-
> well transactions which do not display any error. I do not claim that
> it is not
> a useful feature, on the contrary I do think that testing under error
> conditions
> is a good capability, but I just insist that it is a on the side
> feature
> should not be put forward. As a consequence, the "maximum number of
> transaction
> tries" should certainly not be the default first output of a summary
> run.

I agree with you that there should be no pre-emminence for my feature.
Here there're my reasons for the order in the reports (except for the
"maximum number of transaction tries", because you discuss it later):
1) Per-statement report: errors and retries are printed at the end.
2) Other reports (main report, per-script report, progress report,
transaction logging, aggregation logging):
- errors are printed before optional results because you don't need any
options for getting errors;
- retries are printed at the end because they need the option
--max-tries.

> I'm unclear about the added example added in the documentation. There
> are 71% errors, but 100% of transactions are reported as processed. If
> there were errors, then it is not a success, so the transaction were
> not
> processed? To me it looks inconsistent. Also, while testing, it seems
> that
> failed transactions are counted in tps, which I think is not
> appropriate:
>
>
> About the feature:
>
> sh> PGOPTIONS='-c default_transaction_isolation=serializable' \
> ./pgbench -P 1 -T 3 -r -M prepared -j 2 -c 4
> starting vacuum...end.
> progress: 1.0 s, 10845.8 tps, lat 0.091 ms stddev 0.491, 10474 failed
> # NOT 10845.8 TPS...
> progress: 2.0 s, 10534.6 tps, lat 0.094 ms stddev 0.658, 10203 failed
> progress: 3.0 s, 10643.4 tps, lat 0.096 ms stddev 0.568, 10290 failed
> ...
> number of transactions actually processed: 32028 # NO!
> number of errors: 30969 (96.694 %)
> latency average = 2.833 ms
> latency stddev = 1.508 ms
> tps = 10666.720870 (including connections establishing) # NO
> tps = 10683.034369 (excluding connections establishing) # NO
> ...
>
> For me this is all wrong. I think that the tps report is about
> transactions
> that succeeded, not mere attempts. I cannot say that a transaction
> which aborted
> was "actually processed"... as it was not.

Thank you, I agree and I will fix it.

> The order of reported elements is not logical:
>
> maximum number of transaction tries: 100
> scaling factor: 10
> query mode: prepared
> number of clients: 4
> number of threads: 2
> duration: 3 s
> number of transactions actually processed: 967
> number of errors: 152 (15.719 %)
> latency average = 9.630 ms
> latency stddev = 13.366 ms
> number of transactions retried: 623 (64.426 %)
> number of retries: 32272
>
> I would suggest to group everything about error handling in one block,
> eg something like:
>
> scaling factor: 10
> query mode: prepared
> number of clients: 4
> number of threads: 2
> duration: 3 s
> number of transactions actually processed: 967
> number of errors: 152 (15.719 %)
> number of transactions retried: 623 (64.426 %)
> number of retries: 32272
> maximum number of transaction tries: 100
> latency average = 9.630 ms
> latency stddev = 13.366 ms

Do you think this is ok when only errors and the maximum number of
transaction tries are printed. (because retries are printed if they are
not-zero)? We can get something like this:

scaling factor: 10
query mode: prepared
number of clients: 4
number of threads: 2
duration: 3 s
number of transactions actually processed: 967
number of errors: 152 (15.719 %)
maximum number of transaction tries: 1 # default value
latency average = 9.630 ms
latency stddev = 13.366 ms

or this:

scaling factor: 10
query mode: prepared
number of clients: 4
number of threads: 2
duration: 3 s
number of transactions actually processed: 967
number of errors: 152 (15.719 %)
maximum number of transaction tries: 3 # not the default value
latency average = 9.630 ms
latency stddev = 13.366 ms

> Also, percent character should be stuck to its number: 15.719% to have
> the style more homogeneous (although there seems to be pre-existing
> inhomogeneities).
>
> I would replace "transaction tries/retried" by "tries/retried",
> everything
> is about transactions in the report anyway.
>
> Without reading the documentation, the overall report semantics is
> unclear,
> especially given the absurd tps results I got with the my first
> attempt,
> as failing transactions are counted as "processed".

I agree and I will fix it.

> For the detailed report, ISTM that I already said in the previous
> review that
> the 22 columns (?) indentation is much too large:
>
> 0.078 149 30886 UPDATE
> pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;

You did not answer me in [1] about this:

>> Moreover the 25 characters
>> alignment is ugly, better use a much smaller alignment.
>
> The variables for the numbers of failures and retries are of type int64
> since the variable for the total number of transactions has the same
> type. That's why such a large alignment (as I understand it now, enough
> 20 characters). Do you prefer floating alignemnts, depending on the
> maximum number of failures/retries for any command in any script?

> When giving the number of retries, I'd like to also have the average
> number of
> retried per attempted transactions, whether they succeeded or failed in
> the end.

Is this not a number of retried divided by the number of attempted
transactions (this is now printed in the main report)? Or do you mean
the average number of retried transactions per script run?

> About the code:
>
> I'm at lost with the 7 states added to the automaton, where I would
> have hoped
> that only 2 (eg RETRY & FAIL, or even less) would be enough.

Well, I will try to simplify it. As I can see now, there should be a
separate code if we end a failed transaction block:
- if we try to end a failed transaction block and a failure occurs, this
failure cannot be retried in any case;
- if we cannot retry a failure and there was a failed transaction block,
we must go to the next command after it, and not to the next command
after the failure.

> I would hope for something like "if (some error) state = ERROR",
> then in "ERROR" do the stats, check whether it should be retried, and
> if
> so state = START... and we are done.

We discussed this question in [3] and [4]:

>>> I wondering whether the RETRY & FAILURE states could/should be
>>> merged:
>>>
>>> on RETRY:
>>> -> count retry
>>> -> actually retry if < max_tries (reset client state, jump to
>>> command)
>>> -> else count failure and skip to end of script
>>>
>>> The start and end of transaction detection seem expensive (malloc,
>>> ...) and assume a one statement per command (what about "BEGIN \; ...
>>> \; COMMIT;", which is not necessarily the case, this limitation
>>> should
>>> be documented. ISTM that the space normalization should be avoided,
>>> and something simpler/lighter should be devised? Possibly it should
>>> consider handling SAVEPOINT.
>>
>> I divided these states because if there's a failed transaction block
>> you
>> should end it before retrying. It means to go to states
>> CSTATE_START_COMMAND -> CSTATE_WAIT_RESULT -> CSTATE_END_COMMAND with
>> the appropriate command. How do you propose not to go to these states?
>
> Hmmm. Maybe I'm wrong. I'll think about it.

> I'm wondering whether the whole feature could be simplified by
> considering that one script is one "transaction" (it is from the
> report point of view at least), and that any retry is for the full
> script only, from its beginning. That would remove the trying to guess
> at transactions begin or end, avoid scanning manually for subcommands,
> and so on.
> - Would it make sense?
> - Would it be ok for your use case?

I'm not sure if this makes sense: if we retry the script from the very
beginning including successful transactions, there may be errors..

> The proposed version of the code looks unmaintainable to me. There are
> 3 levels of nested "switch/case" with state changes at the deepest
> level.
> I cannot even see it on my screen which is not wide enough.

I understand your concern, although the 80-column limit breaks only for
long strings for printf functions. I will try to simplify this by
transferring the 2nd and the 3rd levels of nested "switch/case" into a
separate function.

> There should be a typedef for "random_state", eg something like:
>
> typedef struct { unsigned short data[3]; } RandomState;
>
> Please keep "const" declarations, eg "commandFailed".
>
> I think that choosing script should depend on the thread random state,
> not
> the client random state, so that a run would generate the same pattern
> per
> thread, independently of which client finishes first.
>
> I'm sceptical of the "--debug-fails" options. ISTM that --debug is
> already there
> and should just be reused.

Thanks, I agree with you and I'll fix this.
(Therefore I assume that both the thread state and the client state will
have a random state.)
How do you like the idea of creating several levels for debugging?

> Also, you have changed some existing error unrelated
> error messages with this option, especially in evalFunc, which is
> clearly not
> appropriate:
>
> - fprintf(stderr, "random range is too large\n");
> + if (debug_fails)
> + fprintf(stderr, "random range is too large\n");
>
> Please let unrelated code as is.

I suppose this is a related code. If we do not change this, a program
run that does not use debugging options can receive many error messages
without aborting any clients. + it will be very difficult to get reports
of progress among all this.

> I agree that function naming style is a already a mess, but I think
> that
> new functions you add should use a common style, eg "is_compound" vs
> "canRetry".
>
> Translating error strings to their enum should be put in a function.

Ok, I will fix it.

> I do not believe in the normalize_whitespace function: ISTM that it
> would turn "SELECT LENGTH(' ');" to "SELECT LENGTH(' ');", which
> is not desirable. I do not think that it should be needed.

This function does not change the commands that are passed to the
server; this just simplifies the function is_tx_block_end and makes the
latter more understandable.

> I do not understand the "get subcommands of a compound command" strange
> re-parsing phase. There should be comments explaining what you want to
> achieve.

Well, I will write these comments. My reasons are as follows:
1) I need to know the number of non-empty subcommands for processing the
command in CSTATE_WAIT_RESULT;
2) I need this parser to correctly distinguish subcommands in such cases
as "SELECT ';';";
3) for any non-empty subcommand I need to know its initial sequence
number, it is used to report errors/failures;
4) if the compound command contains only one non-empty subcommand (for
example, the compound command ';;END;;'), it can be retried as a usual
command;
5) retrying of failures in the compound command depends on its
subcommands which start or complete the transaction block.

> I'm not sure about what happens if several begin/end appears
> on the line.

Maybe this section of the documentation [5] will help you?

> I'm not sure this whole thing should be done anyway.

Well, I will simplify it so the compound commands will not be retried in
any case; but even now the parser is needed for reasons 1, 2 and 3 in
the list above ("My reasons are as follows: ...").

> About the tests:
>
> The 828 lines perl script seems over complicated, with pgbench & psql
> interacting together... Is all that really necessary? Isn't some (much)
> simpler
> test possible and would be sufficient?

I wrote to you about this at the beginning of this letter ("3) About the
tests. ...").

> The "node" is started but never stopped.

Thank you, I will fix it.

> For file contents, maybe the << 'EOF' here-document syntax would help
> instead
> of using concatenated backslashed strings everywhere.

Thanks, but I'm not sure that it is better to open file handlers for
printing in variables..

[1]
https://www.postgresql.org/message-id/e4c5e8cefa4a8e88f1273b0f1ee29e56%40postgrespro.ru
[2]
https://www.postgresql.org/message-id/894ac29c18f12a4fd6f8c97c9123a152%40postgrespro.ru
[3]
https://www.postgresql.org/message-id/6ce8613f061001105654673506348c13%40postgrespro.ru
[4]
https://www.postgresql.org/message-id/alpine.DEB.2.20.1707131818200.20175%40lancre
[5]
https://www.postgresql.org/docs/devel/static/protocol-flow.html#PROTOCOL-FLOW-MULTI-STATEMENT

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Re: WIP Patch: Pgbench Serialization and deadlock errors at 2018-01-03 16:28:37 from Fabien COELHO

Responses

Re: WIP Patch: Pgbench Serialization and deadlock errors at 2018-01-12 12:47:08 from Fabien COELHO

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Sergei Kornilov	2018-01-11 12:44:44	Re: numeric regression test passes, but why?
Previous Message	Dagfinn Ilmari =?utf-8?Q?Manns=C3=A5ker?=	2018-01-11 12:39:47	Re: numeric regression test passes, but why?