Quick Links

Re: CPU costs of random_zipfian in pgbench

From:	David Fetter <david(at)fetter(dot)org>
To:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: CPU costs of random_zipfian in pgbench
Date:	2019-02-19 18:03:03
Message-ID:	20190219180303.GZ10435@fetter.org
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Sun, Feb 17, 2019 at 11:02:37PM +0100, Tomas Vondra wrote:
> On 2/17/19 6:33 PM, David Fetter wrote:
> > On Sun, Feb 17, 2019 at 11:09:27AM -0500, Tom Lane wrote:
> >> Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr> writes:
> >>>> I'm trying to use random_zipfian() for benchmarking of skewed data sets,
> >>>> and I ran head-first into an issue with rather excessive CPU costs.
> >>
> >>> If you want skewed but not especially zipfian, use exponential which is
> >>> quite cheap. Also zipfian with a > 1.0 parameter does not have to compute
> >>> the harmonic number, so it depends in the parameter.
> >>
> >> Maybe we should drop support for parameter values < 1.0, then. The idea
> >> that pgbench is doing something so expensive as to require caching seems
> >> flat-out insane from here. That cannot be seen as anything but a foot-gun
> >> for unwary users. Under what circumstances would an informed user use
> >> that random distribution rather than another far-cheaper-to-compute one?
> >>
> >>> ... This is why I submitted a pseudo-random permutation
> >>> function, which alas does not get much momentum from committers.
> >>
> >> TBH, I think pgbench is now much too complex; it does not need more
> >> features, especially not ones that need large caveats in the docs.
> >> (What exactly is the point of having zipfian at all?)
> >
> > Taking a statistical perspective, Zipfian distributions violate some
> > assumptions we make by assuming uniform distributions. This matters
> > because Zipf-distributed data sets are quite common in real life.
> >
>
> I don't think there's any disagreement about the value of non-uniform
> distributions. The question is whether it has to be a zipfian one, when
> the best algorithm we know about is this expensive in some cases? Or
> would an exponential distribution be enough?

I suppose to people who care about the difference between Zipf and
exponential would appreciate having the former around to test.

Whether pgbench should support this is a different question, and it's
sounding a little like the answer to that one is "no."

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

In response to

Re: CPU costs of random_zipfian in pgbench at 2019-02-17 22:02:37 from Tomas Vondra

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Magnus Hagander	2019-02-19 18:10:30	Re: Some thoughts on NFS
Previous Message	Julien Rouhaud	2019-02-19 17:48:04	Re: BUG #15572: Misleading message reported by "Drop function operation" on DB with functions having same name