Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

From: Yugo NAGATA <nagata(at)sraoss(dot)co(dot)jp>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Marina Polyakova <m(dot)polyakova(at)postgrespro(dot)ru>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
Date: 2021-07-07 23:35:06
Message-ID: 20210708083506.30fa700e2432d6da81c59393@sraoss.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello Fabien,

I attached the updated patch (v14)!

On Wed, 30 Jun 2021 17:33:24 +0200 (CEST)
Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr> wrote:

> >> --report-latencies -> --report-per-command: should we keep supporting
> >> the previous option?
> >
> > Ok. Although now the option is not only for latencies, considering users who
> > are using the existing option, I'm fine with this. I got back this to the
> > previous name.
>
> Hmmm. I liked the new name! My point was whether we need to support the
> old one as well for compatibility, or whether we should not bother. I'm
> still wondering. As I think that the new name is better, I'd suggest to
> keep it.

Ok. I misunderstood it. I returned the option name to report-per-command.

If we keep report-latencies, I can imagine the following choises:
- use report-latencies to print only latency information
- use report-latencies as alias of report-per-command for compatibility
and remove at an appropriate timing. (that is, treat as deprecated)

Among these, I prefer the latter because ISTM we would not need many options
for reporting information per command. However, actually, I wander that we
don't have to keep the previous one if we plan to remove it eventually.

> >> --failures-detailed: if we bother to run with handling failures, should
> >> it always be on?
> >
> > If we print other failures that cannot be retried in future, it could a lot
> > of lines and might make some users who don't need details of failures annoyed.
> > Moreover, some users would always need information of detailed failures in log,
> > and others would need only total numbers of failures.
>
> Ok.
>
> > Currently we handle only serialization and deadlock failures, so the number of
> > lines printed and the number of columns of logging is not large even under the
> > failures-detail, but if we have a chance to handle other failures in future,
> > ISTM adding this option makes sense considering users who would like simple
> > outputs.
>
> Hmmm. What kind of failures could be managed with retries? I guess that on
> a connection failure we can try to reconnect, but otherwise it is less
> clear that other failures make sense to retry.

Indeed, there would few failures that we should retry and I can not imagine
other than serialization , deadlock, and connection failures for now. However,
considering reporting the number of failed transaction and its causes in future,
as you said

> Given that we manage errors, ISTM that we should not necessarily
> stop on other not retried errors, but rather count/report them and
> possibly proceed.

, we could define more a few kind of failures. At least we can consider
meta-command and other SQL commands errors in addition to serialization,
deadlock, connection failures. So, the total number of kind of failures would
be five at least and reporting always all of them results a lot of lines and
columns in logging.

> >> --debug-errors: I'm not sure we should want a special debug mode for that,
> >> I'd consider integrating it with the standard debug, or just for development.
> >
> > I think --debug is a debug option for telling users the pgbench's internal
> > behaviors, that is, which client is doing what. On other hand, --debug-errors
> > is for telling users what error caused a retry or a failure in detail. For
> > users who are not interested in pgbench's internal behavior (sending a command,
> > receiving a result, ... ) but interested in actual errors raised during running
> > script, this option seems useful.
>
> Ok. The this is not really about debug per se, but a verbosity setting?

I think so.

> Maybe --verbose-errors would make more sense? I'm unsure. I'll think about
> it.

Agreed. This seems more proper than the previous one, so I fixed the name to
--verbose-errors.

> > Sorry, I couldn't understand your suggestion. Is this about the order of case
> > statements or pg_log_error?
>
> My sentence got mixed up. My point was about the case order, so that they
> are put in a more logical order when reading all the cases.

Ok. Considering the loical order, I moved WAIT_ROLLBACK_RESULT into
between ERROR and RETRY, because WAIT_ROLLBACK_RESULT comes atter ERROR state,
and RETRY comes after ERROR or WAIT_ROLLBACK_RESULT..

> >> Currently, ISTM that the retry on error mode is implicitely always on.
> >> Do we want that? I'd say yes, but maybe people could disagree.
> >
> > The default values of max-tries is 1, so the retry on error is off.
>
> > Failed transactions are retried only when the user wants it and
> > specifies a valid value to max-treis.
>
> Ok. My point is that we do not stop on such errors, whereas before ISTM
> that we would have stopped, so somehow the default behavior has changed
> and the previous behavior cannot be reinstated with an option. Maybe that
> is not bad, but this is a behavioral change which needs to be documented
> and argumented.

I understood. Indeed, there is a behavioural change about whether we abort
the client after some types of errors or not. Now, serialization / deadlock
errors don't cause the abortion and are recorded as failures whereas other
errors cause to abort the client.

If we would want to record other errors as failures in future, we would need
a new option to specify which type of failures (or all types of errors, maybe)
should be reported. Until that time, ISTM we can treat serialization and
deadlock as something special errors to be reported as failures.

I rewrote "Failures and Serialization/Deadlock Retries" section a bit to
emphasis that such errors are treated differently than other errors.

> >> See the attached files for generating deadlocks reliably (start with 2
> >> clients). What do you think? The PL/pgSQL minimal, it is really
> >> client-code oriented.
> >
> > Sorry, but I cannot find the attached file.
>
> Sorry. Attached to this mail. The serialization stuff does not seem to
> work as well as the deadlock one. Run with 2 clients.

Hmmm, your test didn't work well for me. Both tests got stuck in
pgbench_deadlock_wait() and pgbench didn't finish.

Regards,
Yugo Nagata

--
Yugo NAGATA <nagata(at)sraoss(dot)co(dot)jp>

Attachment Content-Type Size
v14-0002-Pgbench-errors-and-serialization-deadlock-retrie.patch text/x-diff 81.9 KB
v14-0001-Pgbench-errors-use-the-Variables-structure-for-c.patch text/x-diff 15.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Euler Taveira 2021-07-08 00:33:41 Re: row filtering for logical replication
Previous Message Peter Geoghegan 2021-07-07 22:50:48 Re: [PoC] Improve dead tuple storage for lazy vacuum