Re: pgbench - implement strict TPC-B benchmark

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgbench - implement strict TPC-B benchmark
Date: 2019-08-01 06:52:52
Message-ID: alpine.DEB.2.21.1908010654590.2683@lancre
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Hello Tom,

> [ shrug... ] TBH, the proposed patch does not look to me like actual
> benchmark kit; it looks like a toy. Nobody who was intent on making their
> benchmark numbers look good would do a significant amount of work in a
> slow, ad-hoc interpreted language. I also wonder to what extent the
> numbers would reflect pgbench itself being the bottleneck.

> Which is really the fundamental problem I've got with all the stuff
> that's been crammed into pgbench of late --- the more computation you're
> doing there, the less your results measure the server's capabilities
> rather than pgbench's implementation details.

That is a very good question. It is easy to measure the overhead, for
instance:

sh> time pgbench -r -T 30 -M prepared
...
latency average = 2.425 ms
tps = 412.394420 (including connections establishing)
statement latencies in milliseconds:
0.001 \set aid random(1, 100000 * :scale)
0.000 \set bid random(1, 1 * :scale)
0.000 \set tid random(1, 10 * :scale)
0.000 \set delta random(-5000, 5000)
0.022 BEGIN;
0.061 UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
0.038 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
0.046 UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
0.042 UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
0.036 INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
2.178 END;
real 0m30.080s, user 0m0.406s, sys 0m0.689s

The cost of pgbench interpreted part (\set) is under 1/1000. The full time
of the process itself counts for 1.4%, below the inevitable system time
which is 2.3%. Pgbench overheads are pretty small compared to postgres
connection and command execution, and system time. The above used a local
socket, if it were an actual remote network connection, the gap would be
larger. A profile run could collect more data, but that does not seem
necessary.

Some parts of Pgbench could be optimized, eg for expressions the large
switch could be avoided with precomputed function call, some static
analysis could infer some types and avoid calls to generic functions which
have to tests types, and so on. But franckly I do not think that this is
currently needed so I would not bother unless an actual issue is proven.

Also, pgbench overheads must be compared to an actual client application,
which deals with a database through some language (PHP, Python, JS, Java…)
the interpreter of which would be written in C/C++ just like pgbench, and
some library (ORM, DBI, JDBC…), possibly written in the initial language
and relying on libpq under the hood. Ok, there could be some JIT involved,
but it will not change that there are costs there too, and it would have
to do pretty much the same things that pgbench is doing, plus what the
application has to do with the data.

All in all, pgbench overheads are small compared to postgres processing
times and representative of a reasonably optimized client application.

> In any case, even if I were in love with the script itself,

Love is probably not required for a feature demonstration:-)

> we cannot commit something that claims to be "standard TPC-B".

Yep, I clearly underestimated this legal aspect.

> It needs weasel wording that makes it clear that it isn't TPC-B, and
> then you have a problem of user confusion about why we have both
> not-quite-TPC-B-1 and not-quite-TPC-B-2, and which one to use, or which
> one was used in somebody else's tests.

I agree that confusion is no good either.

> I think if you want to show off what these pgbench features are good
> for, it'd be better to find some other example that's not in the
> midst of a legal minefield.

Yep, I got that.

To try to salvage my illustration idea: I could change the name to "demo",
i.e. quite far from "TPC-B", do some extensions to make it differ, eg use
a non-uniform random generator, and then explicitly say that it is a
vaguely inspired by "TPC-B" and intended as a demo script susceptible to
be updated to illustrate new features (eg if using a non-uniform generator
I'd really like to add a permutation layer if available some day).

This way, the "demo" real intention would be very clear.

--
Fabien.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2019-08-01 06:57:13 Re: pgbench - implement strict TPC-B benchmark
Previous Message Noah Misch 2019-08-01 06:51:17 Re: SimpleLruTruncate() mutual exclusion