Re: pgbench - implement strict TPC-B benchmark

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgbench - implement strict TPC-B benchmark
Date: 2019-08-05 20:45:53
Message-ID: alpine.DEB.2.21.1908052208280.26206@lancre
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Hello Andres,

>> Which is a (somehow disappointing) * 3.3 speedup. The impact on the 3
>> complex expressions tests is not measurable, though.
>
> I don't know why that could be disappointing. We put in much more work
> for much smaller gains in other places.

Probably, but I thought I would have a better deal by eliminating most
string stuff from variables.

>> Questions:
>> - how likely is such a patch to pass? (IMHO not likely)
>
> I don't see why? I didn't review the patch in any detail, but it didn't
> look crazy in quick skim? Increasing how much load can be simulated
> using pgbench, is something I personally find much more interesting than
> adding capabilities that very few people will ever use.

Yep, but my point is that the bottleneck is mostly libpq/system, as I
tried to demonstrate with the few experiments I reported.

> FWIW, the areas I find current pgbench "most lacking" during development
> work are:
>
> 1) Data load speed. The data creation is bottlenecked on fprintf in a
> single process.

snprintf actually, could be replaced.

I submitted a patch to add more control on initialization, including a
server-side loading feature, i.e. the client does not send data, the
server generates its own, see 'G':

https://commitfest.postgresql.org/24/2086/

However on my laptop it is slower than client-side loading on a local
socket. The client version is doing around 70 MB/s, the client load is
20-30%, postgres load is 85%, but I'm not sure I can hope for much more on
my SSD. On my laptop the bottleneck is postgres/disk, not fprintf.

> The index builds are done serially. The vacuum could be replaced by COPY
> FREEZE.

Well, it could be added?

> For a lot of meaningful tests one needs 10-1000s of GB of testdata -
> creating that is pretty painful.

Yep.

> 2) Lack of proper initialization integration for custom
> scripts.

Hmmm…

You can always write a psql script for schema and possibly simplistic data
initialization?

However, generating meaningful pseudo-random data for an arbitrary schema
is a pain. I did an external tool for that a few years ago:

http://www.coelho.net/datafiller.html

but it is still a pain.

> I.e. have steps that are in the custom script that allow -i, vacuum, etc
> to be part of the script, rather than separately executable steps.
> --init-steps doesn't do anything for that.

Sure. It just gives some control.

> 3) pgbench overhead, although that's to a significant degree libpq's fault

I'm afraid that is currently the case.

> 4) Ability to cancel pgbench and get approximate results. That currently
> works if the server kicks out the clients, but not when interrupting
> pgbench - which is just plain weird. Obviously that doesn't matter
> for "proper" benchmark runs, but often during development, it's
> enough to run pgbench past some events (say the next checkpoint).

Do you mean have a report anyway on "Ctrl-C"?

I usually do a -P 1 to see the progress, but making Ctrl-C work should be
reasonably easy.

>> - what is its impact to overall performance when actual queries
>> are performed (IMHO very small).
>
> Obviously not huge - I'd also not expect it to be unobservably small
> either.

Hmmm… Indeed, the 20 \set script runs at 2.6 M/s, that is 0.019 µs per
\set, and any discussion over the connection is at least 15 µs (for one
client on a local socket).

--
Fabien.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Migowski 2019-08-05 20:46:47 Re: Adding column "mem_usage" to view pg_prepared_statements
Previous Message Dmitry Dolgov 2019-08-05 20:38:52 Re: Index Skip Scan