Re: Suggestion to add --continue-client-on-abort option to pgbench

From: Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>
To: Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Rintaro Ikeda <ikedarintarof(at)oss(dot)nttdata(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, "slpmcf(at)gmail(dot)com" <slpmcf(at)gmail(dot)com>, "boekewurm+postgres(at)gmail(dot)com" <boekewurm+postgres(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Srinath Reddy Sadipiralla <srinath2133(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Subject: Re: Suggestion to add --continue-client-on-abort option to pgbench
Date: 2025-11-13 06:54:28
Message-ID: 952785F1-A347-4E02-B4AF-B0B42C9ABAFE@gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Nov 13, 2025, at 13:50, Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp> wrote:
>
>
> To trigger a deadlock error, the tables need to have enough rows so that the scan takes some
> time. In my environment, about 1,000 rows were enough to cause a deadlock.
>

Yes, after inserting 1000 rows, I got the assert triggered. I added some logs to track what had been read:

```
% pgbench -n --failures-detailed -M extended -j 2 -c 2 -f deadlock.sql -f deadlock2.sql evantest
pgbench (19devel)
EVAN: on error discard: Got result: res=11, conn=0
EVAN: on error discard: Got result: res=7, conn=0
EVAN: discardUntilSync: Got result: res=10, conn=0 <== received sync
EVAN: discardUntilSync: Got sync, conn=0
EVAN: discardUntilSync: Got result: res=2, conn=0 <== then immediately received result of next select, without a null res in between
EVAN: discardUntilSync: Got result value: 2, conn=0
Assertion failed: (res == ((void*)0)), function discardUntilSync, file pgbench.c, line 3579.
zsh: abort pgbench -n --failures-detailed -M extended -j 2 -c 2 -f deadlock.sql -f
```

Looks like there is not a null result following the PIPELINE_SYNC message.

So the code comment seems to not accurate:
```
/*
* PGRES_PIPELINE_SYNC must be followed by another
* PGRES_PIPELINE_SYNC or NULL; otherwise, assert failure.
*/
Assert(res == NULL);
```

Then I made a dirty change that return from discardUntilSync() once receives SYNC:
```
if (PQresultStatus(res) == PGRES_PIPELINE_SYNC)
{
printf("EVAN: discardUntilSync: Got sync, conn=%d\n",
PQstatus(st->con));
received_sync = true;
st->num_syncs = 0;
PQclear(res);
break;
}
```

that eliminates the assert:
```
% pgbench -n --failures-detailed -M extended -j 2 -c 2 -f deadlock.sql -f deadlock2.sql evantest
pgbench (19devel)
EVAN: on error discard: Got result: res=11, conn=0
EVAN: on error discard: Got result: res=7, conn=0
EVAN: discardUntilSync: Got result: res=10, conn=0
EVAN: discardUntilSync: Got sync, conn=0
pgbench: error: client 0 aborted: failed to exit pipeline mode for rolling back the failed transaction
transaction type: multiple scripts
scaling factor: 1
query mode: extended
number of clients: 2
number of threads: 2
maximum number of tries: 1
number of transactions per client: 10
number of transactions actually processed: 10/20
number of failed transactions: 0 (0.000%)
number of serialization failures: 0 (0.000%)
number of deadlock failures: 0 (0.000%)
latency average = 203.933 ms
initial connection time = 3.006 ms
tps = 9.807152 (without initial connection time)
SQL script 1: deadlock.sql
- weight: 1 (targets 50.0% of total)
- 8 transactions (80.0% of total)
- number of transactions actually processed: 8 (tps = 7.845722)
- number of failed transactions: 0 (0.000%)
- number of serialization failures: 0 (0.000%)
- number of deadlock failures: 0 (0.000%)
- latency average = 127.115 ms
- latency stddev = 332.002 ms
SQL script 2: deadlock2.sql
- weight: 1 (targets 50.0% of total)
- 2 transactions (20.0% of total)
- number of transactions actually processed: 2 (tps = 1.961430)
- number of failed transactions: 0 (0.000%)
- number of serialization failures: 0 (0.000%)
- number of deadlock failures: 0 (0.000%)
- latency average = 1.347 ms
- latency stddev = 0.207 ms
pgbench: error: Run was aborted; the above results are incomplete.
```

So, I think now the key problem is to confirm if there must be a NULL following PGRES_PIPELINE_SYNC.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message jian he 2025-11-13 07:01:42 Re: Extended Statistics set/restore/clear functions.
Previous Message Michael Paquier 2025-11-13 06:47:54 Re: Few untranslated error messages in OAuth