Re: Suggestion to add --continue-client-on-abort option to pgbench

From: Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Rintaro Ikeda <ikedarintarof(at)oss(dot)nttdata(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, "slpmcf(at)gmail(dot)com" <slpmcf(at)gmail(dot)com>, "boekewurm+postgres(at)gmail(dot)com" <boekewurm+postgres(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Srinath Reddy Sadipiralla <srinath2133(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Subject: Re: Suggestion to add --continue-client-on-abort option to pgbench
Date: 2025-09-25 07:22:16
Message-ID: 20250925162216.9d2f1ba3683f85183510d2c5@sraoss.co.jp
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 25 Sep 2025 13:49:05 +0900
Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:

> On Thu, Sep 25, 2025 at 11:17 AM Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp> wrote:
> >
> > On Thu, 25 Sep 2025 11:09:40 +0900
> > Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp> wrote:
> >
> > > On Thu, 25 Sep 2025 02:19:27 +0900
> > > Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> > >
> > > > On Tue, Sep 23, 2025 at 11:58 AM Rintaro Ikeda
> > > > <ikedarintarof(at)oss(dot)nttdata(dot)com> wrote:
> > > > > I think the issue is a new bug because we have transitioned to CSTATE_ABORT
> > > > > immediately after queries failed, without executing discardUntilSync().
> > >
> > > Agreed.
> > >
> > > > If so, the fix in discardUntilSync() should be part of the continue-on-error
> > > > patch. The assertion failure fix should be a separate patch, since only
> > > > that needs to be backpatched (the failure can also occur in back branches).
> > >
> > > +1
> > >
> > > >
> > > > > I've attached a patch that fixes the assertion error. The content of v1 patch by
> > > > > Mr. Nagata is also included. I would appreciate it if you review my patch.
> > >
> > > > + switch (PQresultStatus(res))
> > > > + {
> > > > + case PGRES_PIPELINE_SYNC:
> > > > + received_sync = true;
> > > >
> > > > In the PGRES_PIPELINE_SYNC case, PQclear(res) isn't called but should be.
> > > >
> > > > + case PGRES_FATAL_ERROR:
> > > > + PQclear(res);
> > > > + goto done;
> > > >
> > > > I don't think we need goto here. How about this instead?
> > > >
> > > > -----------------------
> > > > @@ -3525,11 +3525,18 @@ discardUntilSync(CState *st)
> > > > * results have been discarded.
> > > > */
> > > > st->num_syncs = 0;
> > > > - PQclear(res);
> > > > break;
> > > > }
> > > > + /*
> > > > + * Stop receiving further results if PGRES_FATAL_ERROR
> > > > is returned
> > > > + * (e.g., due to a connection failure) before
> > > > PGRES_PIPELINE_SYNC,
> > > > + * since PGRES_PIPELINE_SYNC will never arrive.
> > > > + */
> > > > + else if (PQresultStatus(res) == PGRES_FATAL_ERROR)
> > > > + break;
> > > > PQclear(res);
> > > > }
> > > > + PQclear(res);
> > > >
> > > > /* exit pipeline */
> > > > if (PQexitPipelineMode(st->con) != 1)
> > > > -----------------------
> > >
> > > I think Fujii-san's version is better because Ikeda-san's version doesn't
> > > consider the case where PGRES_PIPELINE_SYNC is followed by another one.
> > > In that situation, the loop would terminate without getting NULL, which would
> > > causes an error in PQexitPipelineMode().
> > >
> > > However, I would like to suggest an alternative solution: checking the connection
> > > status when readCommandResponse() returns false. This seems more straightforwad,
> > > since the cause of the error can be investigated immediately after it is detected.
> > >
> > > @@ -4024,7 +4043,10 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
> > > if (PQpipelineStatus(st->con) != PQ_PIPELINE_ON)
> > > st->state = CSTATE_END_COMMAND;
> > > }
> > > - else if (canRetryError(st->estatus))
> > > + else if (PQstatus(st->con) == CONNECTION_BAD)
> > > + st->state = CSTATE_ABORTED;
> > > + else if ((st->estatus == ESTATUS_OTHER_SQL_ERROR && continue_on_error) ||
> > > + canRetryError(st->estatus))
> > > st->state = CSTATE_ERROR;
> > > else
> > > st->state = CSTATE_ABORTED;
> > >
> > > What do you think?
> > >
> > >
> > > Additionally, I noticed that in pipeline mode, the error message reported in
> > > readCommandResponse() is lost, because it is reset when PQgetResult() returned
> > > NULL to indicate the end of query processing. For example:
> > >
> > > pgbench: client 0 got an error in command 3 (SQL) of script 0;
> > > pgbench: client 1 got an error in command 3 (SQL) of script 0;
> > >
> > > This can be fixed this by saving the previous error message and using it
> > > for the report. After the fix:
> > >
> > > pgbench: client 0 got an error in command 3 (SQL) of script 0; FATAL: terminating connection due to administrator command
> > >
> > > I've attached update patches.
> > >
> > > 0001 fixes the assersion failure in commandError() and error message lost
> > > in readCommandResponse().
> > >
> > > 0002 was the previous 0001 that adds --continiue-on-error, including the
> > > fix to handle connection termination errors.
> > >
> > > 0003 is for improving error messages for errors that cause client abortion.
> >
> > I think the patch 0001 should be back patched since the issues can occurs
> > even for retries of serialization failure or deadlock detection in pipeline mode.
>
> Agreed.
>
> Regarding 0001:
>
> + /*
> + Errors should only be detected during an SQL command or the \endpipeline
> + meta command. Any other case triggers an assertion failure.
> + */
>
> * should be added before "Errors" and "meta".

Oops. Fixed.

> + errmsg = pg_strdup(PQerrorMessage(st->con));
>
> It would be good to add a comment explaining why we do this.
>
> + pg_free(errmsg);
>
> Shouldn't pg_free() be called also in the error case, i.e., after
> jumping to the error label?

Yes, it should be.
Fixed.

I've attached updated patches.

Regards,
Yugo Nagata

--
Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp>

Attachment Content-Type Size
v13-0003-Improve-error-messages-for-errors-that-cause-cli.patch text/x-diff 3.0 KB
v13-0002-Add-continue-on-error-option.patch text/x-diff 14.9 KB
v13-0001-Fix-assertion-failure-and-verbose-messages-in-pi.patch text/x-diff 3.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hayato Kuroda (Fujitsu) 2025-09-25 07:23:33 RE: Remove obsolate comments from 047_checkpoint_physical_slot
Previous Message Shubham Khanna 2025-09-25 07:07:24 Re: Add support for specifying tables in pg_createsubscriber.