Re: CPU costs of random_zipfian in pgbench

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Georgios Kokolatos <gkokolatos(at)pm(dot)me>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: CPU costs of random_zipfian in pgbench
Date: 2019-03-24 20:21:45
Message-ID: alpine.DEB.2.21.1903242006310.9939@lancre
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello Tom,

>>> If this is done, some people with zipfian distribution that currently
>>> work might be unhappy.
>> After giving it some thought, I think that this cannot be fully fixed for
>> 12.
> Just to clarify --- my complaint about "over engineering" referred to
> the fact that a cache exists at all; fooling around with the overflow
> behavior isn't really going to answer that.

Having some cache really makes sense for s < 1, AFAICS.

> The bigger picture here is that a benchmarking tool that contains its
> own performance surprises is not a nice thing to have.

Hmmm. It seems that it depends. Sometimes self inflected wounds are the
soldier's responsability, sometimes they must be prevented.

> It's not very hard to imagine somebody wasting a lot of time trying to
> explain weird results, only to find out that the cause is unstable
> performance of random_zipfian. Or worse, people might draw totally
> incorrect conclusions if they fail to drill down enough to realize that
> there's a problem in pgbench rather than on the server side.

Yep, benchmarking is tougher than it looks: it is very easy to get it
wrong without knowing, whatever tool you use.

>> Given the constraint of Jim Gray's approximated method for s in (0, 1),
>> which really does zipfian for the first two integers and then uses an
>> exponential approximation, the only approach is that the parameters must
>> be computed in a partial eval preparation phase before the bench code is
>> run. This means that only (mostly) constants would be allowed as
>> parameters when s is in (0, 1), but I think that this is acceptable
>> because anyway the method fundamentaly requires it.
> Yeah, if we could store all the required harmonic numbers before the
> test run timing starts, that would address the concern about stable
> performance. But I have to wonder whether zipfian with s < 1 is useful
> enough to justify so much code.

I do not know about that. I do not think that Jim Gray chose to invent an
approximated method for s < 1 because he thought it was fun. I think he
did it because his benchmark data required it. If you need it, you need
it. If you do not need it, you do not care.

>> The attached other attached patch illustrate what I call poor performance
>> for stupid parameters (no point in doing zipfian on 2 integers…) :
>> ...
>> Maybe the implementation could impose that s is at least 1.001 to avoid
>> the lower performance?
> I was wondering about that too. It seems like it'd be a wise idea to
> further constrain s and/or n to ensure that the s > 1 code path doesn't do
> anything too awful ...

Yep. The attached version enforces s >= 1.001, which avoids the worse cost
of iterating, according to my small tests.

> unless someone comes up with a better implementation that has stable
> performance without such constraints.



Attachment Content-Type Size
pgbench-remove-zipf-below-one-2.patch text/x-diff 11.0 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Rahila Syed 2019-03-24 20:42:28 Re: monitoring CREATE INDEX [CONCURRENTLY]
Previous Message Tom Lane 2019-03-24 18:27:46 Re: CPU costs of random_zipfian in pgbench