Re: Suggestion to add --continue-client-on-abort option to pgbench

From: Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>
To: Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Rintaro Ikeda <ikedarintarof(at)oss(dot)nttdata(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, "slpmcf(at)gmail(dot)com" <slpmcf(at)gmail(dot)com>, "boekewurm+postgres(at)gmail(dot)com" <boekewurm+postgres(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Srinath Reddy Sadipiralla <srinath2133(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Subject: Re: Suggestion to add --continue-client-on-abort option to pgbench
Date: 2025-11-13 02:20:30
Message-ID: F31D4C1A-BF31-4010-A85B-1B61654FA6D5@gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Nov 12, 2025, at 17:34, Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp> wrote:
>
> On Wed, 12 Nov 2025 00:20:15 +0900
> Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>
>> On Tue, Nov 11, 2025 at 10:50 AM Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp> wrote:
>>> I've attached a patch to fix this.
>>
>> Thanks for reporting the issue and providing the patch!
>>
>>> If a PGRES_PIPELINE_SYNC is followed by something other than PGRES_PIPELINE_SYNC or NULL,
>>> it means that another PGRES_PIPELINE_SYNC will eventually follow after some other results.
>>> In this case, we should reset the receive_sync flag and continue discarding results.
>>
>> Yes.
>>
>> + if (res)
>> + {
>> + received_sync = false;
>> + continue;
>>
>> Shouldn't we also call PQclear(res) here? For example:
>
> Thank you for your review!
> Yes, we need PQclear() here.
>
> I've attached an updated patch.
>
> The comment for the PQpipelineSync() call has been also updated to clarify
> why it is necessary.
>
> In addition, I added a connection status check in the loop to avoid an
> infinte loop that waiting for PQpipelineSync after a connection failure.
>
> I packed these changes in the same patch, but they can be split into separate
> patches.
>
> What do you think?
>
> Regards,
> Yugo Nagata
>
> --
> Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp>
> <v2-0001-pgbench-Fix-assertion-failure-with-multiple-syncp.patch>

I debugged further this morning, and I think I have found the root cause. Ultimately, the problem is not with discardUntilSync(), instead, discardAvailableResults() mistakenly eats PGRES_PIPELINE_SYNC.

In my debug, I slightly updated Yugo’s script as: (every select returns a different value)
```
% cat pgbench_error.sql
\startpipeline
select 1/0;
\syncpipeline
select 2;
\syncpipeline
select 3;
\syncpipeline
select 4;
\endpipeline
```

Please see my dirty fix in the attachment. The diff is based master + Yugo’s v2 patch.

In my fix, I make discardAvailableResults() to return the PGRES_PIPELINE_SYNC it reads, and moved discardAvailableResults() out of getSQLErrorStatus(), so that if discardAvailableResults() returns a result, then use the result as next_res to continue the reading loop.

Here is my execution output:
```
% pgbench -n --failures-detailed --continue-on-error -M extended -t 5 -f pgbench_error.sql evantest
pgbench (19devel)
EVAN: readCommandResponse: Got result: res=7, conn=0
EVAN: readCommandResponse2: Got result: next_res=7, conn=0
EVAN: discardAvailableResults: Got result: res=10, conn=0
EVAN: discardAvailableResults: Got sync, returning, conn=0
EVAN: readCommandResponse2: Got result: next_res=2, conn=0
EVAN: readCommandResponse2: Got next-result value: 2, conn=0
EVAN: readCommandResponse2: Got result: next_res=7, conn=0
EVAN: readCommandResponse: completed, conn=0
EVAN: readCommandResponse: Got result: res=10, conn=0
EVAN: readCommandResponse2: Got result: next_res=2, conn=0
EVAN: readCommandResponse2: Got next-result value: 3, conn=0
EVAN: readCommandResponse2: Got result: next_res=7, conn=0
EVAN: readCommandResponse: completed, conn=0
EVAN: readCommandResponse: Got result: res=10, conn=0
EVAN: readCommandResponse2: Got result: next_res=2, conn=0
EVAN: readCommandResponse2: Got next-result value: 4, conn=0
EVAN: readCommandResponse2: Got result: next_res=7, conn=0
EVAN: readCommandResponse: completed, conn=0
EVAN: readCommandResponse: Got result: res=10, conn=0
EVAN: readCommandResponse2: Got result: next_res=7, conn=0
EVAN: readCommandResponse: completed, conn=0
EVAN: readCommandResponse: Got result: res=7, conn=0
EVAN: readCommandResponse2: Got result: next_res=7, conn=0
EVAN: discardAvailableResults: Got result: res=10, conn=0
EVAN: discardAvailableResults: Got sync, returning, conn=0
EVAN: readCommandResponse2: Got result: next_res=2, conn=0
EVAN: readCommandResponse2: Got next-result value: 2, conn=0
EVAN: readCommandResponse2: Got result: next_res=7, conn=0
EVAN: readCommandResponse: completed, conn=0
EVAN: readCommandResponse: Got result: res=10, conn=0
EVAN: readCommandResponse2: Got result: next_res=2, conn=0
EVAN: readCommandResponse2: Got next-result value: 3, conn=0
EVAN: readCommandResponse2: Got result: next_res=7, conn=0
EVAN: readCommandResponse: completed, conn=0
EVAN: readCommandResponse: Got result: res=10, conn=0
EVAN: readCommandResponse2: Got result: next_res=2, conn=0
EVAN: readCommandResponse2: Got next-result value: 4, conn=0
EVAN: readCommandResponse2: Got result: next_res=7, conn=0
EVAN: readCommandResponse: completed, conn=0
EVAN: readCommandResponse: Got result: res=10, conn=0
EVAN: readCommandResponse2: Got result: next_res=7, conn=0
EVAN: readCommandResponse: completed, conn=0
EVAN: readCommandResponse: Got result: res=7, conn=0
EVAN: readCommandResponse2: Got result: next_res=7, conn=0
EVAN: discardAvailableResults: Got result: res=10, conn=0
EVAN: discardAvailableResults: Got sync, returning, conn=0
EVAN: readCommandResponse2: Got result: next_res=2, conn=0
EVAN: readCommandResponse2: Got next-result value: 2, conn=0
EVAN: readCommandResponse2: Got result: next_res=7, conn=0
EVAN: readCommandResponse: completed, conn=0
EVAN: readCommandResponse: Got result: res=10, conn=0
EVAN: readCommandResponse2: Got result: next_res=2, conn=0
EVAN: readCommandResponse2: Got next-result value: 3, conn=0
EVAN: readCommandResponse2: Got result: next_res=7, conn=0
EVAN: readCommandResponse: completed, conn=0
EVAN: readCommandResponse: Got result: res=10, conn=0
EVAN: readCommandResponse2: Got result: next_res=2, conn=0
EVAN: readCommandResponse2: Got next-result value: 4, conn=0
EVAN: readCommandResponse2: Got result: next_res=7, conn=0
EVAN: readCommandResponse: completed, conn=0
EVAN: readCommandResponse: Got result: res=10, conn=0
EVAN: readCommandResponse2: Got result: next_res=7, conn=0
EVAN: readCommandResponse: completed, conn=0
EVAN: readCommandResponse: Got result: res=7, conn=0
EVAN: readCommandResponse2: Got result: next_res=7, conn=0
EVAN: discardAvailableResults: Got result: res=10, conn=0
EVAN: discardAvailableResults: Got sync, returning, conn=0
EVAN: readCommandResponse2: Got result: next_res=2, conn=0
EVAN: readCommandResponse2: Got next-result value: 2, conn=0
EVAN: readCommandResponse2: Got result: next_res=7, conn=0
EVAN: readCommandResponse: completed, conn=0
EVAN: readCommandResponse: Got result: res=10, conn=0
EVAN: readCommandResponse2: Got result: next_res=2, conn=0
EVAN: readCommandResponse2: Got next-result value: 3, conn=0
EVAN: readCommandResponse2: Got result: next_res=7, conn=0
EVAN: readCommandResponse: completed, conn=0
EVAN: readCommandResponse: Got result: res=10, conn=0
EVAN: readCommandResponse2: Got result: next_res=2, conn=0
EVAN: readCommandResponse2: Got next-result value: 4, conn=0
EVAN: readCommandResponse2: Got result: next_res=7, conn=0
EVAN: readCommandResponse: completed, conn=0
EVAN: readCommandResponse: Got result: res=10, conn=0
EVAN: readCommandResponse2: Got result: next_res=7, conn=0
EVAN: readCommandResponse: completed, conn=0
EVAN: readCommandResponse: Got result: res=7, conn=0
EVAN: readCommandResponse2: Got result: next_res=7, conn=0
EVAN: discardAvailableResults: Got result: res=10, conn=0
EVAN: discardAvailableResults: Got sync, returning, conn=0
EVAN: readCommandResponse2: Got result: next_res=2, conn=0
EVAN: readCommandResponse2: Got next-result value: 2, conn=0
EVAN: readCommandResponse2: Got result: next_res=7, conn=0
EVAN: readCommandResponse: completed, conn=0
EVAN: readCommandResponse: Got result: res=10, conn=0
EVAN: readCommandResponse2: Got result: next_res=2, conn=0
EVAN: readCommandResponse2: Got next-result value: 3, conn=0
EVAN: readCommandResponse2: Got result: next_res=7, conn=0
EVAN: readCommandResponse: completed, conn=0
EVAN: readCommandResponse: Got result: res=10, conn=0
EVAN: readCommandResponse2: Got result: next_res=2, conn=0
EVAN: readCommandResponse2: Got next-result value: 4, conn=0
EVAN: readCommandResponse2: Got result: next_res=7, conn=0
EVAN: readCommandResponse: completed, conn=0
EVAN: readCommandResponse: Got result: res=10, conn=0
EVAN: readCommandResponse2: Got result: next_res=7, conn=0
EVAN: readCommandResponse: completed, conn=0
transaction type: pgbench_error.sql
scaling factor: 1
query mode: extended
number of clients: 1
number of threads: 1
maximum number of tries: 1
number of transactions per client: 5
number of transactions actually processed: 5/5
number of failed transactions: 0 (0.000%)
number of serialization failures: 0 (0.000%)
number of deadlock failures: 0 (0.000%)
number of other failures: 0 (0.000%)
latency average = 0.265 ms
initial connection time = 2.092 ms
tps = 3773.584906 (without initial connection time)
```

You can see that, select 2/3/4 are properly handled.

Yugo-san, if you add some debug log, you will see that with your patch, 2 and 3 will be discarded by discardUntilSync(), so I don’t think your patch works.

To apply my dirty diff:

* git checkout master
* git am Yugo’s v2 patch
* git apply dirty-fix.diff

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Attachment Content-Type Size
dirty-fix.diff application/octet-stream 6.3 KB
unknown_filename text/plain 3 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Zhijie Hou (Fujitsu) 2025-11-13 02:33:38 RE: Newly created replication slot may be invalidated by checkpoint
Previous Message Quan Zongliang 2025-11-13 02:19:38 Re: Doc: add XML ID attributes to <varlistentry> tags for create_foreign_table, alter_foreign_table