Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

From: Yugo NAGATA <nagata(at)sraoss(dot)co(dot)jp>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Marina Polyakova <m(dot)polyakova(at)postgrespro(dot)ru>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
Date: 2021-06-30 15:02:13
Message-ID: 20210701000213.bd91004653d86255979bbdc0@sraoss.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello Fabien,

On Sat, 26 Jun 2021 12:15:38 +0200 (CEST)
Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr> wrote:

>
> Hello Yugo-san,
>
> # About v12.2
>
> ## Compilation
>
> Patch seems to apply cleanly with "git apply", but does not compile on my
> host: "undefined reference to `conditional_stack_reset'".
>
> However it works better when using the "patch". I'm wondering why git
> apply fails silently…

Hmm, I don't know why your compiling fails... I can apply and complile
successfully using git.

> When compiling there are warnings about "pg_log_fatal", which does not
> expect a FILE* on pgbench.c:4453. Remove the "stderr" argument.

Ok.

> Global and local checks ok.
>
> > number of transactions failed: 324 (3.240%)
> > ...
> > number of transactions retried: 5629 (56.290%)
> > number of total retries: 103299
>
> I'd suggest: "number of failed transactions". "total number of retries" or
> just "number of retries"?

Ok. I fixed to use "number of failed transactions" and "total number of retries".

> ## Feature
>
> The overall code structure changes to implements the feature seems
> reasonable to me, as we are at the 12th iteration of the patch.
>
> Comments below are somehow about details and asking questions
> about choices, and commenting…
>
> ## Documentation
>
> There is a lot of documentation, which is good. I'll review these
> separatly. It looks good, but having a native English speaker/writer
> would really help!
>
> Some output examples do not correspond to actual output for
> the current version. In particular, there is always one TPS figure
> given now, instead of the confusing two shown before.

Fixed.

> ## Comments
>
> transactinos -> transactions.

Fixed.

> ## Code
>
> By default max_tries = 0. Should not the initialization be 1,
> as the documentation argues that it is the default?

Ok. I fixed the default value to 1.

> Counter comments, missing + in the formula on the skipped line.

Fixed.

> Given that we manage errors, ISTM that we should not necessarily
> stop on other not retried errors, but rather count/report them and
> possibly proceed. Eg with something like:
>
> -- server side random fail
> DO LANGUAGE plpgsql $$
> BEGIN
> IF RANDOM() < 0.1 THEN
> RAISE EXCEPTION 'unlucky!';
> END IF;
> END;
> $$;
>
> Or:
>
> -- client side random fail
> BEGIN;
> \if random(1, 10) <= 1
> SELECT 1 +;
> \else
> SELECT 2;
> \endif
> COMMIT;
>
> We could count the fail, rollback if necessary, and go on. What do you think?
> Maybe such behavior would deserve an option.

This feature to count failures that could occur at runtime seems nice. However,
as discussed in [1], I think it is better to focus to only failures that can be
retried in this patch, and introduce the feature to handle other failures in a
separate patch.

[1] https://www.postgresql.org/message-id/alpine.DEB.2.21.1809121519590.13887%40lancre

> --report-latencies -> --report-per-command: should we keep supporting
> the previous option?

Ok. Although now the option is not only for latencies, considering users who
are using the existing option, I'm fine with this. I got back this to the
previous name.

> --failures-detailed: if we bother to run with handling failures, should
> it always be on?

If we print other failures that cannot be retried in future, it could a lot
of lines and might make some users who don't need details of failures annoyed.
Moreover, some users would always need information of detailed failures in log,
and others would need only total numbers of failures.

Currently we handle only serialization and deadlock failures, so the number of
lines printed and the number of columns of logging is not large even under the
failures-detail, but if we have a chance to handle other failures in future,
ISTM adding this option makes sense considering users who would like simple
outputs.

> --debug-errors: I'm not sure we should want a special debug mode for that,
> I'd consider integrating it with the standard debug, or just for development.

I think --debug is a debug option for telling users the pgbench's internal
behaviors, that is, which client is doing what. On other hand, --debug-errors
is for telling users what error caused a retry or a failure in detail. For
users who are not interested in pgbench's internal behavior (sending a command,
receiving a result, ... ) but interested in actual errors raised during running
script, this option seems useful.

> Also, should it use pg_log_debug?

If we use pg_log_debug, the message is printed only under --debug.
Therefore, I fixed to use pg_log_info instead of pg_log_error or fprintf.

> doRetry: I'd separate the 3 no retries options instead of mixing max_tries and
> timer_exceeeded, for clarity.

Ok. I fixed to separate them.

> Tries vs retries: I'm at odds with having tries & retries and + 1 here
> and there to handle that, which is a little bit confusing. I'm wondering whether
> we could only count "tries" and adjust to report what we want later?

I fixed to use "tries" instead of "retries" in CState. However, we still use
"retries" in StatsData and Command because the number of retries is printed
in the final result. Is it less confusing than the previous?

> advanceConnectionState: ISTM that ERROR should logically be before others which
> lead to it.

Sorry, I couldn't understand your suggestion. Is this about the order of case
statements or pg_log_error?

> Variables management: it looks expensive, with copying and freeing variable arrays.
> I'm wondering whether we should think of something more clever. Well, that would be
> for some other patch.

Well.., indeed there may be more efficient way. For example, instead of clearing all
vars in dest, it might be possible to copy or clear only the difference part between
dest and source and remaining unchanged part in dest. Anyway, I think this work should
be done in other patch.

> "Accumulate the retries" -> "Count (re)tries"?

Fixed.

> Currently, ISTM that the retry on error mode is implicitely always on.
> Do we want that? I'd say yes, but maybe people could disagree.

The default values of max-tries is 1, so the retry on error is off.
Failed transactions are retried only when the user wants it and
specifies a valid value to max-treis.

> ## Tests
>
> There are tests, good!
>
> I'm wondering whether something simpler could be devised to trigger
> serialization or deadlock errors, eg with a SEQUENCE and an \if.
>
> See the attached files for generating deadlocks reliably (start with 2 clients).
> What do you think? The PL/pgSQL minimal, it is really client-code
> oriented.
>
> Given that deadlocks are detected about every seconds, the test runs
> would take some time. Let it be for now.

Sorry, but I cannot find the attached file. I don't have a good idea
for a simpler test for now, but I can fix the test based on your idea
after getting the file.

I attached the patch updated according with your suggestion.

Regards,
Yugo Nagata

--
Yugo NAGATA <nagata(at)sraoss(dot)co(dot)jp>

Attachment Content-Type Size
v13-0002-Pgbench-errors-and-serialization-deadlock-retrie.patch text/x-diff 81.5 KB
v13-0001-Pgbench-errors-use-the-Variables-structure-for-c.patch text/x-diff 15.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2021-06-30 15:03:12 Re: Dependency to logging in jsonapi.c
Previous Message Amit Langote 2021-06-30 14:58:23 Re: ExecRTCheckPerms() and many prunable partitions