| From: | Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> |
|---|---|
| To: | Fujii Masao <masao(dot)fujii(at)gmail(dot)com> |
| Cc: | Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp>, Rintaro Ikeda <ikedarintarof(at)oss(dot)nttdata(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, "slpmcf(at)gmail(dot)com" <slpmcf(at)gmail(dot)com>, "boekewurm+postgres(at)gmail(dot)com" <boekewurm+postgres(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Srinath Reddy Sadipiralla <srinath2133(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com> |
| Subject: | Re: Suggestion to add --continue-client-on-abort option to pgbench |
| Date: | 2025-11-13 05:14:37 |
| Message-ID: | 500B504D-265D-490C-9AE3-C340676F4FC9@gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
> On Nov 13, 2025, at 12:02, Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> wrote:
>
>
>
>> On Nov 13, 2025, at 11:47, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>
>> On Thu, Nov 13, 2025 at 11:21 AM Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> wrote:
>>> I debugged further this morning, and I think I have found the root cause. Ultimately, the problem is not with discardUntilSync(), instead, discardAvailableResults() mistakenly eats PGRES_PIPELINE_SYNC.
>>
>> Thanks for debugging!
>>
>> Yes, discardAvailableResults() can discard PGRES_PIPELINE_SYNC,
>> but do you mean that's the root cause of the assertion failure
>> Nagata-san reported?
>> Since that failure can occur even in older branches, I was thinking
>> that newer code
>> like discardAvailableResults() in master isn't the root cause...
>>
>
> I haven’t debugged with old code, but the old code also discard non-NULL results:
>
> ```
> - do
> - {
> - res = PQgetResult(st->con);
> - PQclear(res);
> - } while (res);
> + discardAvailableResults(st);
> ```
>
> Which may also discard the sync message. That’s my guess. I can also debug the old code this afternoon.
>
I just tried the old code but it didn’t trigger the assert with Yugo’s deadlock scripts.
I did "git reset --hard a3ea5330fcf47390c8ab420bbf433a97a54505d6”, that is the previous commit of “—continue-on-error”. And I ran Yugo’s deadlock scripts, but I didn’t get the assert:
```
% pgbench -n --failures-detailed -M extended -j 2 -c 2 -f deadlock.sql -f deadlock2.sql evantest
pgbench (19devel)
transaction type: multiple scripts
scaling factor: 1
query mode: extended
number of clients: 2
number of threads: 2
maximum number of tries: 1
number of transactions per client: 10
number of transactions actually processed: 20/20
number of failed transactions: 0 (0.000%)
number of serialization failures: 0 (0.000%)
number of deadlock failures: 0 (0.000%)
latency average = 0.341 ms
initial connection time = 2.637 ms
tps = 5865.102639 (without initial connection time)
SQL script 1: deadlock.sql
- weight: 1 (targets 50.0% of total)
- 12 transactions (60.0% of total)
- number of transactions actually processed: 12 (tps = 3519.061584)
- number of failed transactions: 0 (0.000%)
- number of serialization failures: 0 (0.000%)
- number of deadlock failures: 0 (0.000%)
- latency average = 0.311 ms
- latency stddev = 0.304 ms
SQL script 2: deadlock2.sql
- weight: 1 (targets 50.0% of total)
- 8 transactions (40.0% of total)
- number of transactions actually processed: 8 (tps = 2346.041056)
- number of failed transactions: 0 (0.000%)
- number of serialization failures: 0 (0.000%)
- number of deadlock failures: 0 (0.000%)
- latency average = 0.366 ms
- latency stddev = 0.364 ms
```
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Peter Smith | 2025-11-13 05:17:17 | DOCS: Missing <structfield> tags for some SEQUENCE fields |
| Previous Message | Zhijie Hou (Fujitsu) | 2025-11-13 04:56:01 | RE: Assertion failure in SnapBuildInitialSnapshot() |