Re: pgbench bug candidate: negative "initial connection time"

From: Yugo NAGATA <nagata(at)sraoss(dot)co(dot)jp>
To: "kuroda(dot)hayato(at)fujitsu(dot)com" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc: "'pgsql-hackers(at)lists(dot)postgresql(dot)org'" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgbench bug candidate: negative "initial connection time"
Date: 2021-06-14 08:05:18
Message-ID: 20210614170518.5c0fb098a6f9a159d920e326@sraoss.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello Kuroda-san,

On Fri, 11 Jun 2021 08:58:45 +0000
"kuroda(dot)hayato(at)fujitsu(dot)com" <kuroda(dot)hayato(at)fujitsu(dot)com> wrote:

> Hi Hackers,
>
> I played pgbench with wrong parameters, and I found bug-candidate.
> The latest commit in my source is 3a09d75.
>
> 1. Do initdb and start.
> 2. Initialize schema and data with "scale factor" = 1.
> 3. execute following command many times:
>
> $ pgbench -c 101 -j 10 postgres
>
> Then, sometimes the negative " initial connection time" was returned.
> Lateyncy average is also strange.
>
> ```
> $ pgbench -c 101 -j 10 postgres
> starting vacuum...end.
> pgbench: error: connection to server on socket "/tmp/.s.PGSQL.5432" failed: FATAL: sorry, too many clients already
> pgbench (PostgreSQL) 14.0
> transaction type: <builtin: TPC-B (sort of)>
> scaling factor: 1
> query mode: simple
> number of clients: 101
> number of threads: 10
> number of transactions per client: 10
> number of transactions actually processed: 910/1010
> latency average = 41387686.662 ms
> initial connection time = -372896921.586 ms
> tps = 0.002440 (without initial connection time)
> ```
>
> I sought pgbench.c and found a reason.
> When a thread failed to get some connections, they do not fill any values to thread->bench_start in threadRun().
> And if the failure is caused in the final thread (this means threads[nthreads - 1]->bench_start is zero),
> the following if-statement sets bench_start to zero.
>
> ```
> 6494 /* first recorded benchmarking start time */
> 6495 if (bench_start == 0 || thread->bench_start < bench_start)
> 6496 bench_start = thread->bench_start;
> ```
>
> The wrong bench_start propagates to printResult() and then some invalid values are appered.
>
> ```
> 6509 printResults(&stats, pg_time_now() - bench_start, conn_total_duration,
> 6510 bench_start - start_time, latency_late);
> ```
>
> I cannot distinguish whether we have to fix it, but I attache the patch.
> This simply ignores a result when therad->bench_start is zero.

+ /* skip if the thread faild to get connection */
+ if (thread->bench_start == 0)
+ continue;

It detects if a thread failed to get the initial connection by thread->bench_start == 0, but this assumes the initial value is zero. For ensuring this, I think it is better to initialize it in an early state, for example like this.

@@ -6419,6 +6419,7 @@ main(int argc, char **argv)
initRandomState(&thread->ts_throttle_rs);
initRandomState(&thread->ts_sample_rs);
thread->logfile = NULL; /* filled in later */
+ thread->bench_start = 0;
thread->latency_late = 0;
initStats(&thread->stats, 0)

typo: faild -> failed

Regards,
Yugo Nagata
--
Yugo NAGATA <nagata(at)sraoss(dot)co(dot)jp>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Yugo NAGATA 2021-06-14 08:07:02 Re: pgbench bug candidate: negative "initial connection time"
Previous Message Yugo NAGATA 2021-06-14 07:06:10 Re: Avoid stuck of pbgench due to skipped transactions