Re: CPU costs of random_zipfian in pgbench

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Georgios Kokolatos <gkokolatos(at)pm(dot)me>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Fabien Coelho <postgresql(dot)org(at)coelho(dot)net>
Subject: Re: CPU costs of random_zipfian in pgbench
Date: 2019-03-24 03:33:03
Message-ID: 010a884f-8ffc-4cb7-9773-fd05f32ef4cc@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 3/23/19 6:44 PM, Fabien COELHO wrote:
>
> Hello Tom,
>
>> I started to look through this, and the more I looked the more unhappy
>> I got that we're having this discussion at all.  The zipfian support
>> in pgbench is seriously over-engineered and under-documented.  As an
>> example, I was flabbergasted to find out that the end-of-run summary
>> statistics now include this:
>>
>>    /* Report zipfian cache overflow */
>>    for (i = 0; i < nthreads; i++)
>>    {
>>        totalCacheOverflows += threads[i].zipf_cache.overflowCount;
>>    }
>>    if (totalCacheOverflows > 0)
>>    {
>>        printf("zipfian cache array overflowed %d time(s)\n",
>> totalCacheOverflows);
>>    }
>>
>> What is the point of that, and if there is a point, why is it nowhere
>> mentioned in pgbench.sgml?
>
> Indeed, there should.
>
>> What would a user do with this information, and how would they know
>> what to do?
>
> Sure, but it was unclear what to do. Extending the cache to avoid that
> would look like over-engineering.
>

That seems like a rather strange argument. What exactly is so complex on
resizing the cache to quality as over-engineering?

If the choice is between reporting the failure to the user, and
addressing the failure, surely the latter would be the default option?
Particularly if the user can't really address the issue easily
(recompiling psql is not very practical solution).

>> I remain of the opinion that we ought to simply rip out support for
>> zipfian with s < 1.
>

+1 to that

> Some people really want zipfian because it reflects their data access
> pattern, possibly with s < 1.
>
> We cannot helpt it: real life seems zipfianly distributed:-)
>

Sure. But that hardly means we ought to provide algorithms that we know
are not suitable for benchmarking tools, which I'd argue is this case.

Also, we have two algorithms for generating zipfian distributions. Why
wouldn't it be sufficient to keep just one of them?

>> It's not useful for benchmarking purposes to have a random-number
>> function with such poor computational properties.
>
> This is mostly a startup cost, the generation cost when a bench is
> running is reasonable. How to best implement the precomputation is an
> open question.
>

... which means it's not a startup cost. IMHO this simply shows pgbench
does not have the necessary infrastructure to provide this feature.

> As a reviewer I was not thrilled by the cache stuff, but I had no better
> idea that would not fall under "over-over engineering" or the like.
>
> Maybe it could error out and say "recompile me", but then someone
> would have said "that is unacceptable".
>
> Maybe it could auto extend the cache, but that is still more unnecessary
> over-engineering, IMHO.
>

I'm puzzled. Why would that be over-engineering?

> Maybe a there could be some mandatory declarations or partial eval that
> could precompute the needed parameters out/before the bench is started,
> with a clear message "precomputing stuff...", but that would be over
> over over engineering again... and that would mean restricting
> random_zipfian parameters to near-constants, which would require some
> explaining, but maybe it is an option. I guess that in the paper
> original context, the parameters (s & n) are known before the bench is
> started, so that the needed value are computed offline once and for all.
>
>> I think leaving it in there is just a foot-gun: we'd be a lot better
>> off throwing an error that tells people to use some other distribution.
>
> When s < 1, the startup cost is indeed a pain. However, it is a pain
> prescribed by a Turing Award.
>

Then we shouldn't have it, probably. Or we should at least implement a
proper startup phase, so that the costly precomputation is not included
in the test interval.

>> Or if we do leave it in there, we for sure have to have documentation
>> that *actually* explains how to use it, which this patch still doesn't.
>
> I'm not sure what explaining there could be about how to use it: one
> calls the function to obtain pseudo-random integers with the desired
> distribution?
>

Well, I'd argue the current description "performance is poor" is not
particularly clear.

>> There's nothing suggesting that you'd better not use a large number of
>> different (n,s) combinations.
>
> Indeed, there is no caveat about this point, as noted above.
> > Please find an updated patch for the documentation, pointing out the
> existence of the cache and an advice not to overdo it.
> > It does not solve the underlying problem which raised your complaint,
> but at least it is documented.
>

I think you forgot to attache the patch ...

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2019-03-24 05:57:10 Re: [HACKERS] proposal: schema variables
Previous Message Edmund Horner 2019-03-24 03:26:47 Re: Fix foreign key constraint check for partitioned tables