Re: CPU costs of random_zipfian in pgbench

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Georgios Kokolatos <gkokolatos(at)pm(dot)me>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: CPU costs of random_zipfian in pgbench
Date: 2019-03-24 18:05:49
Message-ID: alpine.DEB.2.21.1903241823070.9939@lancre
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello Tom & Tomas,

>> If the choice is between reporting the failure to the user, and
>> addressing the failure, surely the latter would be the default option?
>> Particularly if the user can't really address the issue easily
>> (recompiling psql is not very practical solution).
>>>> I remain of the opinion that we ought to simply rip out support for
>>>> zipfian with s < 1.
>> +1 to that
> If this is done, some people with zipfian distribution that currently
> work might be unhappy.

After giving it some thought, I think that this cannot be fully fixed for

The attached patch removes the code for param in (0, 1), and slightly
improve the documentation about the performance, if you want to proceed.

For s > 1, there is no such constraint, and it works fine, there is no
reason to remove it.

Given the constraint of Jim Gray's approximated method for s in (0, 1),
which really does zipfian for the first two integers and then uses an
exponential approximation, the only approach is that the parameters must
be computed in a partial eval preparation phase before the bench code is
run. This means that only (mostly) constants would be allowed as
parameters when s is in (0, 1), but I think that this is acceptable
because anyway the method fundamentaly requires it. I think that it can be
implemented reasonably well (meaning not too much code), but would
requires a few round of reviews if someone implements it (for a reminder,
I was only the reviewer on this one). An added benefit would be that the
parameter cache could be shared between thread, which would be a good

The attached other attached patch illustrate what I call poor performance
for stupid parameters (no point in doing zipfian on 2 integers…) :

./pgbench -T 3 -D n=2 -D s=1.01 -f zipf_perf.sql # 46981 tps
./pgbench -T 3 -D n=2 -D s=1.001 -f zipf_perf.sql # 6187 tps
./pgbench -T 3 -D n=2 -D s=1.0001 -f zipf_perf.sql # 710 tps

./pgbench -T 3 -D n=100 -D s=1.01 -f zipf_perf.sql # 142910 tps
./pgbench -T 3 -D n=100 -D s=1.001 -f zipf_perf.sql # 21214 tps
./pgbench -T 3 -D n=100 -D s=1.0001 -f zipf_perf.sql # 2466 tps

./pgbench -T 3 -D n=1000000 -D s=1.01 -f zipf_perf.sql # 376453 tps
./pgbench -T 3 -D n=1000000 -D s=1.001 -f zipf_perf.sql # 57441 tps
./pgbench -T 3 -D n=1000000 -D s=1.0001 -f zipf_perf.sql # 6780 tps

Maybe the implementation could impose that s is at least 1.001 to avoid
the lower performance?


Attachment Content-Type Size
pgbench-remove-zipf-below-one-1.patch text/x-diff 10.3 KB
zipf_perf.sql application/x-sql 33 bytes

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-03-24 18:06:59 Re: warning to publication created and wal_level is not set to logical
Previous Message David Fetter 2019-03-24 17:54:48 Re: warning to publication created and wal_level is not set to logical