Re: CPU costs of random_zipfian in pgbench

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: Georgios Kokolatos <gkokolatos(at)pm(dot)me>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Fabien Coelho <postgresql(dot)org(at)coelho(dot)net>
Subject: Re: CPU costs of random_zipfian in pgbench
Date: 2019-03-23 17:01:23
Message-ID: 3703.1553360483@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr> writes:
> [ pgbench-zipf-doc-3.patch ]

I started to look through this, and the more I looked the more unhappy
I got that we're having this discussion at all. The zipfian support
in pgbench is seriously over-engineered and under-documented. As an
example, I was flabbergasted to find out that the end-of-run summary
statistics now include this:

/* Report zipfian cache overflow */
for (i = 0; i < nthreads; i++)
{
totalCacheOverflows += threads[i].zipf_cache.overflowCount;
}
if (totalCacheOverflows > 0)
{
printf("zipfian cache array overflowed %d time(s)\n", totalCacheOverflows);
}

What is the point of that, and if there is a point, why is it nowhere
mentioned in pgbench.sgml? What would a user do with this information,
and how would they know what to do?

I remain of the opinion that we ought to simply rip out support for
zipfian with s < 1. It's not useful for benchmarking purposes to have
a random-number function with such poor computational properties.
I think leaving it in there is just a foot-gun: we'd be a lot better
off throwing an error that tells people to use some other distribution.

Or if we do leave it in there, we for sure have to have documentation
that *actually* explains how to use it, which this patch still doesn't.
There's nothing suggesting that you'd better not use a large number of
different (n,s) combinations.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2019-03-23 17:44:35 Re: CPU costs of random_zipfian in pgbench
Previous Message Julien Rouhaud 2019-03-23 16:18:11 Re: Ordered Partitioned Table Scans