|From:||Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>|
|To:||Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>|
|Cc:||Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Georgios Kokolatos <gkokolatos(at)pm(dot)me>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>|
|Subject:||Re: CPU costs of random_zipfian in pgbench|
|Views:||Raw Message | Whole Thread | Download mbox | Resend email|
>>> If this is done, some people with zipfian distribution that currently
>>> work might be unhappy.
>> After giving it some thought, I think that this cannot be fully fixed for
> Just to clarify --- my complaint about "over engineering" referred to
> the fact that a cache exists at all; fooling around with the overflow
> behavior isn't really going to answer that.
Having some cache really makes sense for s < 1, AFAICS.
> The bigger picture here is that a benchmarking tool that contains its
> own performance surprises is not a nice thing to have.
Hmmm. It seems that it depends. Sometimes self inflected wounds are the
soldier's responsability, sometimes they must be prevented.
> It's not very hard to imagine somebody wasting a lot of time trying to
> explain weird results, only to find out that the cause is unstable
> performance of random_zipfian. Or worse, people might draw totally
> incorrect conclusions if they fail to drill down enough to realize that
> there's a problem in pgbench rather than on the server side.
Yep, benchmarking is tougher than it looks: it is very easy to get it
wrong without knowing, whatever tool you use.
>> Given the constraint of Jim Gray's approximated method for s in (0, 1),
>> which really does zipfian for the first two integers and then uses an
>> exponential approximation, the only approach is that the parameters must
>> be computed in a partial eval preparation phase before the bench code is
>> run. This means that only (mostly) constants would be allowed as
>> parameters when s is in (0, 1), but I think that this is acceptable
>> because anyway the method fundamentaly requires it.
> Yeah, if we could store all the required harmonic numbers before the
> test run timing starts, that would address the concern about stable
> performance. But I have to wonder whether zipfian with s < 1 is useful
> enough to justify so much code.
I do not know about that. I do not think that Jim Gray chose to invent an
approximated method for s < 1 because he thought it was fun. I think he
did it because his benchmark data required it. If you need it, you need
it. If you do not need it, you do not care.
>> The attached other attached patch illustrate what I call poor performance
>> for stupid parameters (no point in doing zipfian on 2 integers…) :
>> Maybe the implementation could impose that s is at least 1.001 to avoid
>> the lower performance?
> I was wondering about that too. It seems like it'd be a wise idea to
> further constrain s and/or n to ensure that the s > 1 code path doesn't do
> anything too awful ...
Yep. The attached version enforces s >= 1.001, which avoids the worse cost
of iterating, according to my small tests.
> unless someone comes up with a better implementation that has stable
> performance without such constraints.
|Next Message||Rahila Syed||2019-03-24 20:42:28||Re: monitoring CREATE INDEX [CONCURRENTLY]|
|Previous Message||Tom Lane||2019-03-24 18:27:46||Re: CPU costs of random_zipfian in pgbench|