Re: CPU costs of random_zipfian in pgbench

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: CPU costs of random_zipfian in pgbench
Date: 2019-02-17 16:09:27
Message-ID: 6065.1550419767@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr> writes:
>> I'm trying to use random_zipfian() for benchmarking of skewed data sets,
>> and I ran head-first into an issue with rather excessive CPU costs.

> If you want skewed but not especially zipfian, use exponential which is
> quite cheap. Also zipfian with a > 1.0 parameter does not have to compute
> the harmonic number, so it depends in the parameter.

Maybe we should drop support for parameter values < 1.0, then. The idea
that pgbench is doing something so expensive as to require caching seems
flat-out insane from here. That cannot be seen as anything but a foot-gun
for unwary users. Under what circumstances would an informed user use
that random distribution rather than another far-cheaper-to-compute one?

> ... This is why I submitted a pseudo-random permutation
> function, which alas does not get much momentum from committers.

TBH, I think pgbench is now much too complex; it does not need more
features, especially not ones that need large caveats in the docs.
(What exactly is the point of having zipfian at all?)

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Gierth 2019-02-17 16:19:05 Re: Ryu floating point output patch
Previous Message Tom Lane 2019-02-17 15:56:06 Re: Ryu floating point output patch