pgbench bug candidate: negative "initial connection time"

From: "kuroda(dot)hayato(at)fujitsu(dot)com" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: "'pgsql-hackers(at)lists(dot)postgresql(dot)org'" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: pgbench bug candidate: negative "initial connection time"
Date: 2021-06-11 08:58:45
Message-ID: TYCPR01MB5870057375ACA8A73099C649F5349@TYCPR01MB5870.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Hackers,

I played pgbench with wrong parameters, and I found bug-candidate.
The latest commit in my source is 3a09d75.

1. Do initdb and start.
2. Initialize schema and data with "scale factor" = 1.
3. execute following command many times:

$ pgbench -c 101 -j 10 postgres

Then, sometimes the negative " initial connection time" was returned.
Lateyncy average is also strange.

```
$ pgbench -c 101 -j 10 postgres
starting vacuum...end.
pgbench: error: connection to server on socket "/tmp/.s.PGSQL.5432" failed: FATAL: sorry, too many clients already
pgbench (PostgreSQL) 14.0
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 1
query mode: simple
number of clients: 101
number of threads: 10
number of transactions per client: 10
number of transactions actually processed: 910/1010
latency average = 41387686.662 ms
initial connection time = -372896921.586 ms
tps = 0.002440 (without initial connection time)
```

I sought pgbench.c and found a reason.
When a thread failed to get some connections, they do not fill any values to thread->bench_start in threadRun().
And if the failure is caused in the final thread (this means threads[nthreads - 1]->bench_start is zero),
the following if-statement sets bench_start to zero.

```
6494 /* first recorded benchmarking start time */
6495 if (bench_start == 0 || thread->bench_start < bench_start)
6496 bench_start = thread->bench_start;
```

The wrong bench_start propagates to printResult() and then some invalid values are appered.

```
6509 printResults(&stats, pg_time_now() - bench_start, conn_total_duration,
6510 bench_start - start_time, latency_late);
```

I cannot distinguish whether we have to fix it, but I attache the patch.
This simply ignores a result when therad->bench_start is zero.

How do you think?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachment Content-Type Size
ignore_failed_thread.patch application/octet-stream 543 bytes

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2021-06-11 09:49:15 Re: Race condition in recovery?
Previous Message Peter Smith 2021-06-11 08:34:48 Re: [HACKERS] logical decoding of two-phase transactions