| From: | Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp> |
|---|---|
| To: | Fujii Masao <masao(dot)fujii(at)gmail(dot)com> |
| Cc: | Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, Rintaro Ikeda <ikedarintarof(at)oss(dot)nttdata(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, "slpmcf(at)gmail(dot)com" <slpmcf(at)gmail(dot)com>, "boekewurm+postgres(at)gmail(dot)com" <boekewurm+postgres(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Srinath Reddy Sadipiralla <srinath2133(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com> |
| Subject: | Re: Suggestion to add --continue-client-on-abort option to pgbench |
| Date: | 2025-11-11 01:50:37 |
| Message-ID: | 20251111105037.f3fc554616bc19891f926c5b@sraoss.co.jp |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Fri, 7 Nov 2025 18:33:17 +0900
Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> I plan to commit the patch soon, but let's keep discussing and
> investigating the case you mentioned afterward!
I'm sorry for the late reply and for not joining the discussion earlier.
I've spent some time investigating the code in pgbench and libpq, and
it seems to me that your commit looks fine.
However, I found another issue related to the --continue-on-error option,
where an assertion failure occurs in the following test case:
$ cat pgbench_error.sql
\startpipeline
select 1/0;
\syncpipeline
select 1;
\syncpipeline
select 1;
\syncpipeline
select 1;
\endpipeline
$ pgbench -f pgbench_error.sql -M extended --continue-on-error -T 1
pgbench (19devel)
starting vacuum...end.
pgbench: pgbench.c:3594: discardUntilSync: Assertion `res == ((void *)0)' failed.
Even after removing the Assert(), we get the following error:
pgbench: error: client 0 aborted: failed to exit pipeline mode for rolling back the failed transaction
This happens because discardUntilSync() does not expect that a PGRES_TUPLES_OK may be
received after \syncpipeline, and also fails to discard all PGRES_PIPELINE_SYNC results
when multiple \syncpipeline commands are used.
I've attached a patch to fix this.
If a PGRES_PIPELINE_SYNC is followed by something other than PGRES_PIPELINE_SYNC or NULL,
it means that another PGRES_PIPELINE_SYNC will eventually follow after some other results.
In this case, we should reset the receive_sync flag and continue discarding results.
I think this fix should be back-patched, since this is not a bug introduced by
--continue-on-error. The same assertion failure occurs in the following test case,
where transactions are retried after a deadlock error:
$ cat deadlock.sql
\startpipeline
select * from a order by i for update;
select 1;
\syncpipeline
select 1;
\syncpipeline
select 1;
\syncpipeline
select 1;
\endpipeline
$ cat deadlock2.sql
\startpipeline
select * from a order by i desc for update;
select 1;
\syncpipeline
select 1;
\syncpipeline
select 1;
\syncpipeline
select 1;
\endpipeline
$ pgbench -f deadlock.sql -f deadlock2.sql -j 2 -c 2 -M extended
Regards,
Yugo Nagata
--
Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp>
| Attachment | Content-Type | Size |
|---|---|---|
| 0001-Make-sure-discardUntilSync-discards-until-the-last-s.patch | text/x-diff | 1.5 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Jonathan S. Katz | 2025-11-11 01:57:32 | Re: 2025-11-13 release announcement draft |
| Previous Message | Michael Paquier | 2025-11-11 01:43:42 | Re: Sequence Access Methods, round two |