| From: | Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> |
|---|---|
| To: | Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> |
| Cc: | Rintaro Ikeda <ikedarintarof(at)oss(dot)nttdata(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, "slpmcf(at)gmail(dot)com" <slpmcf(at)gmail(dot)com>, "boekewurm+postgres(at)gmail(dot)com" <boekewurm+postgres(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Srinath Reddy Sadipiralla <srinath2133(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com> |
| Subject: | Re: Suggestion to add --continue-client-on-abort option to pgbench |
| Date: | 2025-11-13 02:20:30 |
| Message-ID: | F31D4C1A-BF31-4010-A85B-1B61654FA6D5@gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
> On Nov 12, 2025, at 17:34, Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp> wrote:
>
> On Wed, 12 Nov 2025 00:20:15 +0900
> Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>
>> On Tue, Nov 11, 2025 at 10:50 AM Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp> wrote:
>>> I've attached a patch to fix this.
>>
>> Thanks for reporting the issue and providing the patch!
>>
>>> If a PGRES_PIPELINE_SYNC is followed by something other than PGRES_PIPELINE_SYNC or NULL,
>>> it means that another PGRES_PIPELINE_SYNC will eventually follow after some other results.
>>> In this case, we should reset the receive_sync flag and continue discarding results.
>>
>> Yes.
>>
>> + if (res)
>> + {
>> + received_sync = false;
>> + continue;
>>
>> Shouldn't we also call PQclear(res) here? For example:
>
> Thank you for your review!
> Yes, we need PQclear() here.
>
> I've attached an updated patch.
>
> The comment for the PQpipelineSync() call has been also updated to clarify
> why it is necessary.
>
> In addition, I added a connection status check in the loop to avoid an
> infinte loop that waiting for PQpipelineSync after a connection failure.
>
> I packed these changes in the same patch, but they can be split into separate
> patches.
>
> What do you think?
>
> Regards,
> Yugo Nagata
>
> --
> Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp>
> <v2-0001-pgbench-Fix-assertion-failure-with-multiple-syncp.patch>
I debugged further this morning, and I think I have found the root cause. Ultimately, the problem is not with discardUntilSync(), instead, discardAvailableResults() mistakenly eats PGRES_PIPELINE_SYNC.
In my debug, I slightly updated Yugo’s script as: (every select returns a different value)
```
% cat pgbench_error.sql
\startpipeline
select 1/0;
\syncpipeline
select 2;
\syncpipeline
select 3;
\syncpipeline
select 4;
\endpipeline
```
Please see my dirty fix in the attachment. The diff is based master + Yugo’s v2 patch.
In my fix, I make discardAvailableResults() to return the PGRES_PIPELINE_SYNC it reads, and moved discardAvailableResults() out of getSQLErrorStatus(), so that if discardAvailableResults() returns a result, then use the result as next_res to continue the reading loop.
Here is my execution output:
```
% pgbench -n --failures-detailed --continue-on-error -M extended -t 5 -f pgbench_error.sql evantest
pgbench (19devel)
EVAN: readCommandResponse: Got result: res=7, conn=0
EVAN: readCommandResponse2: Got result: next_res=7, conn=0
EVAN: discardAvailableResults: Got result: res=10, conn=0
EVAN: discardAvailableResults: Got sync, returning, conn=0
EVAN: readCommandResponse2: Got result: next_res=2, conn=0
EVAN: readCommandResponse2: Got next-result value: 2, conn=0
EVAN: readCommandResponse2: Got result: next_res=7, conn=0
EVAN: readCommandResponse: completed, conn=0
EVAN: readCommandResponse: Got result: res=10, conn=0
EVAN: readCommandResponse2: Got result: next_res=2, conn=0
EVAN: readCommandResponse2: Got next-result value: 3, conn=0
EVAN: readCommandResponse2: Got result: next_res=7, conn=0
EVAN: readCommandResponse: completed, conn=0
EVAN: readCommandResponse: Got result: res=10, conn=0
EVAN: readCommandResponse2: Got result: next_res=2, conn=0
EVAN: readCommandResponse2: Got next-result value: 4, conn=0
EVAN: readCommandResponse2: Got result: next_res=7, conn=0
EVAN: readCommandResponse: completed, conn=0
EVAN: readCommandResponse: Got result: res=10, conn=0
EVAN: readCommandResponse2: Got result: next_res=7, conn=0
EVAN: readCommandResponse: completed, conn=0
EVAN: readCommandResponse: Got result: res=7, conn=0
EVAN: readCommandResponse2: Got result: next_res=7, conn=0
EVAN: discardAvailableResults: Got result: res=10, conn=0
EVAN: discardAvailableResults: Got sync, returning, conn=0
EVAN: readCommandResponse2: Got result: next_res=2, conn=0
EVAN: readCommandResponse2: Got next-result value: 2, conn=0
EVAN: readCommandResponse2: Got result: next_res=7, conn=0
EVAN: readCommandResponse: completed, conn=0
EVAN: readCommandResponse: Got result: res=10, conn=0
EVAN: readCommandResponse2: Got result: next_res=2, conn=0
EVAN: readCommandResponse2: Got next-result value: 3, conn=0
EVAN: readCommandResponse2: Got result: next_res=7, conn=0
EVAN: readCommandResponse: completed, conn=0
EVAN: readCommandResponse: Got result: res=10, conn=0
EVAN: readCommandResponse2: Got result: next_res=2, conn=0
EVAN: readCommandResponse2: Got next-result value: 4, conn=0
EVAN: readCommandResponse2: Got result: next_res=7, conn=0
EVAN: readCommandResponse: completed, conn=0
EVAN: readCommandResponse: Got result: res=10, conn=0
EVAN: readCommandResponse2: Got result: next_res=7, conn=0
EVAN: readCommandResponse: completed, conn=0
EVAN: readCommandResponse: Got result: res=7, conn=0
EVAN: readCommandResponse2: Got result: next_res=7, conn=0
EVAN: discardAvailableResults: Got result: res=10, conn=0
EVAN: discardAvailableResults: Got sync, returning, conn=0
EVAN: readCommandResponse2: Got result: next_res=2, conn=0
EVAN: readCommandResponse2: Got next-result value: 2, conn=0
EVAN: readCommandResponse2: Got result: next_res=7, conn=0
EVAN: readCommandResponse: completed, conn=0
EVAN: readCommandResponse: Got result: res=10, conn=0
EVAN: readCommandResponse2: Got result: next_res=2, conn=0
EVAN: readCommandResponse2: Got next-result value: 3, conn=0
EVAN: readCommandResponse2: Got result: next_res=7, conn=0
EVAN: readCommandResponse: completed, conn=0
EVAN: readCommandResponse: Got result: res=10, conn=0
EVAN: readCommandResponse2: Got result: next_res=2, conn=0
EVAN: readCommandResponse2: Got next-result value: 4, conn=0
EVAN: readCommandResponse2: Got result: next_res=7, conn=0
EVAN: readCommandResponse: completed, conn=0
EVAN: readCommandResponse: Got result: res=10, conn=0
EVAN: readCommandResponse2: Got result: next_res=7, conn=0
EVAN: readCommandResponse: completed, conn=0
EVAN: readCommandResponse: Got result: res=7, conn=0
EVAN: readCommandResponse2: Got result: next_res=7, conn=0
EVAN: discardAvailableResults: Got result: res=10, conn=0
EVAN: discardAvailableResults: Got sync, returning, conn=0
EVAN: readCommandResponse2: Got result: next_res=2, conn=0
EVAN: readCommandResponse2: Got next-result value: 2, conn=0
EVAN: readCommandResponse2: Got result: next_res=7, conn=0
EVAN: readCommandResponse: completed, conn=0
EVAN: readCommandResponse: Got result: res=10, conn=0
EVAN: readCommandResponse2: Got result: next_res=2, conn=0
EVAN: readCommandResponse2: Got next-result value: 3, conn=0
EVAN: readCommandResponse2: Got result: next_res=7, conn=0
EVAN: readCommandResponse: completed, conn=0
EVAN: readCommandResponse: Got result: res=10, conn=0
EVAN: readCommandResponse2: Got result: next_res=2, conn=0
EVAN: readCommandResponse2: Got next-result value: 4, conn=0
EVAN: readCommandResponse2: Got result: next_res=7, conn=0
EVAN: readCommandResponse: completed, conn=0
EVAN: readCommandResponse: Got result: res=10, conn=0
EVAN: readCommandResponse2: Got result: next_res=7, conn=0
EVAN: readCommandResponse: completed, conn=0
EVAN: readCommandResponse: Got result: res=7, conn=0
EVAN: readCommandResponse2: Got result: next_res=7, conn=0
EVAN: discardAvailableResults: Got result: res=10, conn=0
EVAN: discardAvailableResults: Got sync, returning, conn=0
EVAN: readCommandResponse2: Got result: next_res=2, conn=0
EVAN: readCommandResponse2: Got next-result value: 2, conn=0
EVAN: readCommandResponse2: Got result: next_res=7, conn=0
EVAN: readCommandResponse: completed, conn=0
EVAN: readCommandResponse: Got result: res=10, conn=0
EVAN: readCommandResponse2: Got result: next_res=2, conn=0
EVAN: readCommandResponse2: Got next-result value: 3, conn=0
EVAN: readCommandResponse2: Got result: next_res=7, conn=0
EVAN: readCommandResponse: completed, conn=0
EVAN: readCommandResponse: Got result: res=10, conn=0
EVAN: readCommandResponse2: Got result: next_res=2, conn=0
EVAN: readCommandResponse2: Got next-result value: 4, conn=0
EVAN: readCommandResponse2: Got result: next_res=7, conn=0
EVAN: readCommandResponse: completed, conn=0
EVAN: readCommandResponse: Got result: res=10, conn=0
EVAN: readCommandResponse2: Got result: next_res=7, conn=0
EVAN: readCommandResponse: completed, conn=0
transaction type: pgbench_error.sql
scaling factor: 1
query mode: extended
number of clients: 1
number of threads: 1
maximum number of tries: 1
number of transactions per client: 5
number of transactions actually processed: 5/5
number of failed transactions: 0 (0.000%)
number of serialization failures: 0 (0.000%)
number of deadlock failures: 0 (0.000%)
number of other failures: 0 (0.000%)
latency average = 0.265 ms
initial connection time = 2.092 ms
tps = 3773.584906 (without initial connection time)
```
You can see that, select 2/3/4 are properly handled.
Yugo-san, if you add some debug log, you will see that with your patch, 2 and 3 will be discarded by discardUntilSync(), so I don’t think your patch works.
To apply my dirty diff:
* git checkout master
* git am Yugo’s v2 patch
* git apply dirty-fix.diff
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
| Attachment | Content-Type | Size |
|---|---|---|
| dirty-fix.diff | application/octet-stream | 6.3 KB |
| unknown_filename | text/plain | 3 bytes |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Zhijie Hou (Fujitsu) | 2025-11-13 02:33:38 | RE: Newly created replication slot may be invalidated by checkpoint |
| Previous Message | Quan Zongliang | 2025-11-13 02:19:38 | Re: Doc: add XML ID attributes to <varlistentry> tags for create_foreign_table, alter_foreign_table |