RE: Suggestion to add --continue-client-on-abort option to pgbench

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: 'Rintaro Ikeda' <ikedarintarof(at)oss(dot)nttdata(dot)com>, 'Yugo Nagata' <nagata(at)sraoss(dot)co(dot)jp>
Cc: "slpmcf(at)gmail(dot)com" <slpmcf(at)gmail(dot)com>, "boekewurm+postgres(at)gmail(dot)com" <boekewurm+postgres(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Srinath Reddy Sadipiralla <srinath2133(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Subject: RE: Suggestion to add --continue-client-on-abort option to pgbench
Date: 2025-07-04 13:01:12
Message-ID: OSCPR01MB1496699E141E99B9D385C880DF542A@OSCPR01MB14966.jpnprd01.prod.outlook.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear Ikeda-san, Nagata-san,

Thanks for updating the patch!

> > Could I confirm what you mean by "start new one"?
> >
> > In the current pgbench, when a query raises an error (a deadlock or
> > serialization failure), it can be retried using the same random state.
> > This typically means the query will be retried with the same parameter values.
> >
> > On the other hand, when the query ultimately fails (possibly after some retries),
> > the transaction is marked as a "failure", and the next transaction starts with a
> > new random state (i.e., with new parameter values).
> >
> > Therefore, if a query fails due to a unique constraint violation and is retried
> > with the same parameters, it will keep failing on each retry.
>
> Thank you for your explanation. I understand it as you described. I've also
> attached a schematic diagram of the state machine. I hope it will help clarify
> the behavior of pgbench. Red arrows represent the transition of state when SQL
> command fails and --continue-on-error option is specified.

Thanks for the diagram, it's quite helpful. Let me share my understanding and opinion.

The terminology "retry" is being used for the transition CSTATE_ERROR->CSTATE_RETRY,
and here the random state would be restored to be the begining:

```
/*
* Reset the random state as they were at the beginning of the
* transaction.
*/
st->cs_func_rs = st->random_state;
```

In --continue-on-error case, the transaction CSTATE_WAIT_RESULT->CSTATE_ERROR
can happen even the reason of failure is not serialization and deadlock.
Ultimately the pass will reach ...->CSTATE_END_TX->CSTATE_CHOOSE_SCRIPT, the
beginning of the state machine. cs_func_rs is not overwritten in the route so
that different random value would be generated, or even another script may be
chosen. Is it correct?

And I feel this behavior is OK. Most likely failure here is the unique constraint
violation. Clients should roll the dice again otherwise it would face the same
error again.

Below are my comments for the latest patch.

01.
```
$ git am ../patches/pgbench/v5-0001-Add-continue-on-error-opt
ion-to-pgbench.patch
Applying: When the option is set, client rolls back the failed transaction and...
.git/rebase-apply/patch:65: trailing whitespace.
<literal>serialization</literal>, <literal>deadlock</literal>, or
.git/rebase-apply/patch:139: trailing whitespace.
<option>--max-tries</option> option is not equal to 1 and
warning: 2 lines add whitespace errors.
```

I got warnings when I applied the patch. Please fix it.

02.
```
+ * 'serialization_failures' + 'deadlock_failures' +
+ * 'other_sql_failures' (they got a error when continue-on-error option
```
The first line has the tab, but it should be normal blank.

03.
```
+ else if (continue_on_error | canRetryError(st->estatus))
```

I feel "|" should be "||".

04.
```
<term><replaceable>retries</replaceable></term>
<listitem>
<para>
number of retries after serialization or deadlock errors
(zero unless <option>--max-tries</option> is not equal to one)
</para>
</listitem>
```

To confirm; --continue-on-error won't be counted here because it is not "retry",
in other words, it does not reach CSTATE_RETRY, right?

Best regards,
Hayato Kuroda
FUJITSU LIMITED

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2025-07-04 13:18:46 Re: Explicitly enable meson features in CI
Previous Message Andres Freund 2025-07-04 12:39:45 Re: cpluspluscheck vs ICU again