From: | Yugo NAGATA <nagata(at)sraoss(dot)co(dot)jp> |
---|---|
To: | "kuroda(dot)hayato(at)fujitsu(dot)com" <kuroda(dot)hayato(at)fujitsu(dot)com> |
Cc: | "'pgsql-hackers(at)lists(dot)postgresql(dot)org'" <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: pgbench bug candidate: negative "initial connection time" |
Date: | 2021-06-14 08:05:18 |
Message-ID: | 20210614170518.5c0fb098a6f9a159d920e326@sraoss.co.jp |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello Kuroda-san,
On Fri, 11 Jun 2021 08:58:45 +0000
"kuroda(dot)hayato(at)fujitsu(dot)com" <kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
> Hi Hackers,
>
> I played pgbench with wrong parameters, and I found bug-candidate.
> The latest commit in my source is 3a09d75.
>
> 1. Do initdb and start.
> 2. Initialize schema and data with "scale factor" = 1.
> 3. execute following command many times:
>
> $ pgbench -c 101 -j 10 postgres
>
> Then, sometimes the negative " initial connection time" was returned.
> Lateyncy average is also strange.
>
> ```
> $ pgbench -c 101 -j 10 postgres
> starting vacuum...end.
> pgbench: error: connection to server on socket "/tmp/.s.PGSQL.5432" failed: FATAL: sorry, too many clients already
> pgbench (PostgreSQL) 14.0
> transaction type: <builtin: TPC-B (sort of)>
> scaling factor: 1
> query mode: simple
> number of clients: 101
> number of threads: 10
> number of transactions per client: 10
> number of transactions actually processed: 910/1010
> latency average = 41387686.662 ms
> initial connection time = -372896921.586 ms
> tps = 0.002440 (without initial connection time)
> ```
>
> I sought pgbench.c and found a reason.
> When a thread failed to get some connections, they do not fill any values to thread->bench_start in threadRun().
> And if the failure is caused in the final thread (this means threads[nthreads - 1]->bench_start is zero),
> the following if-statement sets bench_start to zero.
>
> ```
> 6494 /* first recorded benchmarking start time */
> 6495 if (bench_start == 0 || thread->bench_start < bench_start)
> 6496 bench_start = thread->bench_start;
> ```
>
> The wrong bench_start propagates to printResult() and then some invalid values are appered.
>
> ```
> 6509 printResults(&stats, pg_time_now() - bench_start, conn_total_duration,
> 6510 bench_start - start_time, latency_late);
> ```
>
> I cannot distinguish whether we have to fix it, but I attache the patch.
> This simply ignores a result when therad->bench_start is zero.
+ /* skip if the thread faild to get connection */
+ if (thread->bench_start == 0)
+ continue;
It detects if a thread failed to get the initial connection by thread->bench_start == 0, but this assumes the initial value is zero. For ensuring this, I think it is better to initialize it in an early state, for example like this.
@@ -6419,6 +6419,7 @@ main(int argc, char **argv)
initRandomState(&thread->ts_throttle_rs);
initRandomState(&thread->ts_sample_rs);
thread->logfile = NULL; /* filled in later */
+ thread->bench_start = 0;
thread->latency_late = 0;
initStats(&thread->stats, 0)
typo: faild -> failed
Regards,
Yugo Nagata
--
Yugo NAGATA <nagata(at)sraoss(dot)co(dot)jp>
From | Date | Subject | |
---|---|---|---|
Next Message | Yugo NAGATA | 2021-06-14 08:07:02 | Re: pgbench bug candidate: negative "initial connection time" |
Previous Message | Yugo NAGATA | 2021-06-14 07:06:10 | Re: Avoid stuck of pbgench due to skipped transactions |