Re: CPU costs of random_zipfian in pgbench

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Georgios Kokolatos <gkokolatos(at)pm(dot)me>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: CPU costs of random_zipfian in pgbench
Date: 2019-03-24 18:27:46
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr> writes:
>>>> I remain of the opinion that we ought to simply rip out support for
>>>> zipfian with s < 1.

>>> +1 to that

>> If this is done, some people with zipfian distribution that currently
>> work might be unhappy.

> After giving it some thought, I think that this cannot be fully fixed for
> 12.

Just to clarify --- my complaint about "over engineering" referred to
the fact that a cache exists at all; fooling around with the overflow
behavior isn't really going to answer that.

The bigger picture here is that a benchmarking tool that contains its
own performance surprises is not a nice thing to have. It's not very
hard to imagine somebody wasting a lot of time trying to explain weird
results, only to find out that the cause is unstable performance of
random_zipfian. Or worse, people might draw totally incorrect conclusions
if they fail to drill down enough to realize that there's a problem in
pgbench rather than on the server side.

> Given the constraint of Jim Gray's approximated method for s in (0, 1),
> which really does zipfian for the first two integers and then uses an
> exponential approximation, the only approach is that the parameters must
> be computed in a partial eval preparation phase before the bench code is
> run. This means that only (mostly) constants would be allowed as
> parameters when s is in (0, 1), but I think that this is acceptable
> because anyway the method fundamentaly requires it.

Yeah, if we could store all the required harmonic numbers before the
test run timing starts, that would address the concern about stable
performance. But I have to wonder whether zipfian with s < 1 is useful
enough to justify so much code.

> The attached other attached patch illustrate what I call poor performance
> for stupid parameters (no point in doing zipfian on 2 integers…) :
> ...
> Maybe the implementation could impose that s is at least 1.001 to avoid
> the lower performance?

I was wondering about that too. It seems like it'd be a wise idea to
further constrain s and/or n to ensure that the s > 1 code path doesn't do
anything too awful ... unless someone comes up with a better implementation
that has stable performance without such constraints.

regards, tom lane

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2019-03-24 20:21:45 Re: CPU costs of random_zipfian in pgbench
Previous Message Simon Riggs 2019-03-24 18:23:17 Re: Error message inconsistency