Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Yugo NAGATA <nagata(at)sraoss(dot)co(dot)jp>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Marina Polyakova <m(dot)polyakova(at)postgrespro(dot)ru>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
Date: 2021-06-30 15:33:24
Message-ID: alpine.DEB.2.22.394.2106301716320.2013120@pseudo
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello Yugo-san,

Thanks for the update!

>> Patch seems to apply cleanly with "git apply", but does not compile on my
>> host: "undefined reference to `conditional_stack_reset'".
>> However it works better when using the "patch". I'm wondering why git
>> apply fails silently…
> Hmm, I don't know why your compiling fails... I can apply and complile
> successfully using git.

Hmmm. Strange!

>> Given that we manage errors, ISTM that we should not necessarily stop
>> on other not retried errors, but rather count/report them and possibly
>> proceed. Eg with something like: [...] We could count the fail,
>> rollback if necessary, and go on. What do you think? Maybe such
>> behavior would deserve an option.
> This feature to count failures that could occur at runtime seems nice. However,
> as discussed in [1], I think it is better to focus to only failures that can be
> retried in this patch, and introduce the feature to handle other failures in a
> separate patch.


>> --report-latencies -> --report-per-command: should we keep supporting
>> the previous option?
> Ok. Although now the option is not only for latencies, considering users who
> are using the existing option, I'm fine with this. I got back this to the
> previous name.

Hmmm. I liked the new name! My point was whether we need to support the
old one as well for compatibility, or whether we should not bother. I'm
still wondering. As I think that the new name is better, I'd suggest to
keep it.

>> --failures-detailed: if we bother to run with handling failures, should
>> it always be on?
> If we print other failures that cannot be retried in future, it could a lot
> of lines and might make some users who don't need details of failures annoyed.
> Moreover, some users would always need information of detailed failures in log,
> and others would need only total numbers of failures.


> Currently we handle only serialization and deadlock failures, so the number of
> lines printed and the number of columns of logging is not large even under the
> failures-detail, but if we have a chance to handle other failures in future,
> ISTM adding this option makes sense considering users who would like simple
> outputs.

Hmmm. What kind of failures could be managed with retries? I guess that on
a connection failure we can try to reconnect, but otherwise it is less
clear that other failures make sense to retry.

>> --debug-errors: I'm not sure we should want a special debug mode for that,
>> I'd consider integrating it with the standard debug, or just for development.
> I think --debug is a debug option for telling users the pgbench's internal
> behaviors, that is, which client is doing what. On other hand, --debug-errors
> is for telling users what error caused a retry or a failure in detail. For
> users who are not interested in pgbench's internal behavior (sending a command,
> receiving a result, ... ) but interested in actual errors raised during running
> script, this option seems useful.

Ok. The this is not really about debug per se, but a verbosity setting?
Maybe --verbose-errors would make more sense? I'm unsure. I'll think about

>> Also, should it use pg_log_debug?
> If we use pg_log_debug, the message is printed only under --debug.
> Therefore, I fixed to use pg_log_info instead of pg_log_error or fprintf.

Ok, pg_log_info seems right.

>> Tries vs retries: I'm at odds with having tries & retries and + 1 here
>> and there to handle that, which is a little bit confusing. I'm wondering whether
>> we could only count "tries" and adjust to report what we want later?
> I fixed to use "tries" instead of "retries" in CState. However, we still use
> "retries" in StatsData and Command because the number of retries is printed
> in the final result. Is it less confusing than the previous?

I'm going to think about it.

>> advanceConnectionState: ISTM that ERROR should logically be before others which
>> lead to it.
> Sorry, I couldn't understand your suggestion. Is this about the order of case
> statements or pg_log_error?

My sentence got mixed up. My point was about the case order, so that they
are put in a more logical order when reading all the cases.

>> Currently, ISTM that the retry on error mode is implicitely always on.
>> Do we want that? I'd say yes, but maybe people could disagree.
> The default values of max-tries is 1, so the retry on error is off.

> Failed transactions are retried only when the user wants it and
> specifies a valid value to max-treis.

Ok. My point is that we do not stop on such errors, whereas before ISTM
that we would have stopped, so somehow the default behavior has changed
and the previous behavior cannot be reinstated with an option. Maybe that
is not bad, but this is a behavioral change which needs to be documented
and argumented.

>> See the attached files for generating deadlocks reliably (start with 2
>> clients). What do you think? The PL/pgSQL minimal, it is really
>> client-code oriented.
> Sorry, but I cannot find the attached file.

Sorry. Attached to this mail. The serialization stuff does not seem to
work as well as the deadlock one. Run with 2 clients.


Attachment Content-Type Size
deadlock_prep.sql application/x-sql 686 bytes
deadlock.sql application/x-sql 814 bytes
serializable.sql application/x-sql 691 bytes

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2021-06-30 15:36:34 Re: trivial improvement to system_or_bail
Previous Message Alvaro Herrera 2021-06-30 15:24:33 trivial improvement to system_or_bail