Re: few ideas for pgbench

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Fabien COELHO <fabien(dot)coelho(at)mines-paristech(dot)fr>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: few ideas for pgbench
Date: 2021-05-07 07:58:33
Message-ID: CAFj8pRCXtLCyYD2TN+DR9X0h4mNonZCxZQPqUv0fr69MUqq4UQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

pá 7. 5. 2021 v 9:46 odesílatel Fabien COELHO <
fabien(dot)coelho(at)mines-paristech(dot)fr> napsal:

>
> Hello,
>
> >>> When I run pgbench, I usually work with more releases together, so the
> >>> server version is important info.
> >>
> >> Ok. Yes.
>
> Here is a putative patch for this simple part.
>

+1

> >>> What about CSV? Any run can produce one row.
> >>
> >> Yep, CSV is simple and nice. It depends on what information you would
> >> like. For instance for progress report (-P 1) or logs/sampling (-l)
> would
> >> be relevant candidates for CSV. Not so much for the final report,
> though.
> >
> > I think so there can be almost all information. We have to ensure
> > consistency of columns.
> >
> > The basic usage can be
> >
> > for ....
> > do
> > pg_bench ... >> logfile
> > done
> >
> > and log file can looks like
> >
> > start time, rowno, serverver, clientver, connections, scale, readonly,
> > jobs, tps, latency, ...
> >
> > The header row can be optional
>
> Hmmm. It is less clear how to do that with minimal code impact on the
> code, as some options which change the report structure, eg when using
> multiple scripts (-b/-f) or having detailed per-op informations (-r), as
> show below:
>
> sh> pgbench -P 1 -T 10 -M prepared -c 2 -b se(at)9 -b si -r
> starting vacuum...end.
> progress: 1.0 s, 10666.9 tps, lat 0.186 ms stddev 0.454
> progress: 2.0 s, 9928.0 tps, lat 0.201 ms stddev 0.466
> progress: 3.0 s, 10314.8 tps, lat 0.193 ms stddev 0.469
> progress: 4.0 s, 10042.7 tps, lat 0.198 ms stddev 0.466
> progress: 5.0 s, 11084.3 tps, lat 0.180 ms stddev 0.408
> progress: 6.0 s, 9804.1 tps, lat 0.203 ms stddev 0.474
> progress: 7.0 s, 10271.5 tps, lat 0.194 ms stddev 0.463
> progress: 8.0 s, 10511.5 tps, lat 0.190 ms stddev 0.424
> progress: 9.0 s, 10005.7 tps, lat 0.199 ms stddev 0.501
> progress: 10.0 s, 10512.4 tps, lat 0.190 ms stddev 0.428
> pgbench (PostgreSQL) 14.0
> server version: 13.2
> transaction type: multiple scripts
> scaling factor: 1
> query mode: prepared
> number of clients: 2
> number of threads: 1
> duration: 10 s
> number of transactions actually processed: 103144
> latency average = 0.193 ms
> latency stddev = 0.455 ms
> initial connection time = 5.043 ms
> tps = 10319.361549 (without initial connection time)
> SQL script 1: <builtin: select only>
> - weight: 9 (targets 90.0% of total)
> - 92654 transactions (89.8% of total, tps = 9269.856947)
> - latency average = 0.052 ms
> - latency stddev = 0.018 ms
> - statement latencies in milliseconds:
> 0.000 \set aid random(1, 100000 * :scale)
> 0.052 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
> SQL script 2: <builtin: simple update>
> - weight: 1 (targets 10.0% of total)
> - 10490 transactions (10.2% of total, tps = 1049.504602)
> - latency average = 1.436 ms
> - latency stddev = 0.562 ms
> - statement latencies in milliseconds:
> 0.001 \set aid random(1, 100000 * :scale)
> 0.000 \set bid random(1, 1 * :scale)
> 0.000 \set tid random(1, 10 * :scale)
> 0.000 \set delta random(-5000, 5000)
> 0.027 BEGIN;
> 0.065 UPDATE pgbench_accounts SET abalance = abalance + :delta
> WHERE aid = :aid;
> 0.045 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
> 0.048 INSERT INTO pgbench_history (tid, bid, aid, delta, mtime)
> VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
> 1.249 END
>
> The nature of columns would change depending on options, eg "initial
> connection time" does not make sense under -C, so that under a loop around
> pgbench scenario rows would not necessarily be consistent…
>
> Also, I'm not sure whether such a report can/should include all inputs
> options.
>
> Finally it is unclear how to add such a feature with minimal impact on the
> source code.

It is a question if this is possible without more changes or without
compatibility break :( Probably not. All output should be centralized.

> What I usually do is to put each pgbench run output in a separate file and
> write a small shell/perl/python script to process these, possibly
> generating CSV on the way.
>

The goal of my proposal was a reduction of necessity to write auxiliary
scripts. The produced document should not be "nice", but should be very
easy to import it to some analytical tools.

There is an analogy with Postgres's CSV logs. It is the same. We can see
the result of pgbench like some log.

Pavel

>
> --
> Fabien.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2021-05-07 08:09:05 Re: AlterSubscription_refresh "wrconn" wrong variable?
Previous Message Andrey Lepikhov 2021-05-07 07:23:00 Re: Removing unneeded self joins