Re: CPU costs of random_zipfian in pgbench

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: CPU costs of random_zipfian in pgbench
Date: 2019-02-19 15:14:06
Message-ID: alpine.DEB.2.21.1902191137030.7308@lancre
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Hello Peter,

My 0.02€: I'm not quite interested in maintaining a tool for *one*
benchmark, whatever the benchmark, its standardness or quality.

What I like in "pgbench" is that it is both versatile and simple so that
people can benchmark their own data with their own load and their own
queries by writing a few lines of trivial SQL and psql-like slash command
and adjusting a few options, and extract meaningful statistics out of it.

I've been, but not only me, improving it so that it keeps its usage
simplicity but provides key features so that anyone can write a simple but
realistic benchmark.

The key features needed for that, and which happen to be nearly all there
now are:
- some expressions (thanks Roberts for the initial push)
- non uniform random (ok, some are more expensive, too bad)
however using non uniform random generates a correlation issue,
hence the permutation function submission, which took time because
this is a non trivial problem.
- conditionals (\if, taken from psql's implementation)
- getting a result out and being able to do something with it
(\gset, and the associated \cset that Tom does not like).
- improved reporting (including around latency, per script/command/...)
- realistic loads (--rate vs only pedal-to-the-metal runs, --latency-limit)

I have not encountered other tools with this versatility and simplicity.
The TPC-C implementation you point out and others I have seen are
structurally targetted at TPC-C and nothing else. I do not care about
TPC-C per se, I care about people being able to run relevant benchmarks
with minimal effort.

I'm not planning to submit many things in the future (current: a
strict-tpcb implementation which is really of show case of the existing
features, faster server-side initialization, simple refactoring to
simplify/clarify the code structure here and there, maybe some stuff may
migrate to fe_utils if useful to psql), and review what other people find
useful because I know the code base quite well.

I do thing that the maintainability of the code has globally been improved
recently because (1) the process-based implementation has been dropped (2)
the FSA implementation makes the code easier to understand and check,
compared to the lengthy plenty-of-if many-variables function used
beforehand. Bugs have been identified and fixed.

> I agree that pgbench is too complex, given its mandate and design.
> While I found Zipfian useful once or twice, I probably would have done
> just as well with an exponential distribution.

Yep, I agree that exponential is mostly okay for most practical
benchmarking uses, but some benchmark/people seem to really want zipf, so
zipf and its intrinsic underlying complexity was submitted and finally
included.

> I have been using BenchmarkSQL as a fair-use TPC-C implementation for
> my indexing project, with great results. pgbench just isn't very
> useful when validating the changes to B-Tree page splits that I
> propose, because the insertion pattern cannot be modeled
> probabilistically.

I do not understand the use case, and why pgbench could not be used for
this purpose.

> Besides, I really think that things like latency graphs are table stakes
> for this kind of work, which BenchmarkSQL offers out of the box. It
> isn't practical to make pgbench into a framework, which is what I'd
> really like to see. There just isn't that much more than can be done
> there.

Yep. Pgbench only does "simple stats". I script around the per-second
progress output for graphical display and additional stats (eg 5 number
summary…).

--
Fabien.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2019-02-19 15:37:30 Re: Progress reporting for pg_verify_checksums
Previous Message Peter Eisentraut 2019-02-19 15:00:58 Re: unconstify equivalent for volatile