Quick Links

Re: [WIP] Zipfian distribution in pgbench

From:	Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To:	Alik Khilazhev <a(dot)khilazhev(at)postgrespro(dot)ru>
Cc:	PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [WIP] Zipfian distribution in pgbench
Date:	2017-08-05 08:15:05
Message-ID:	alpine.DEB.2.20.1708050950220.16395@lancre
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hello Alik,

>> So I would be in favor of expanding the documentation but not
>> restricting the parameter beyond avoiding value 1.0.
>
> I have removed restriction and expanded documentation in attaching patch v5.

I've done some math investigations, which consisted in spending one hour
with Christian, a statistician colleague of mine. He took an old book out
of a shelf, opened it to page 550 (roughly in the middle), and explained
to me how to build a real zipfian distribution random generator.

The iterative method is for parameter a>1 and works for unbounded values.
It is simple to add a bound. In practice the iterative method is quite
effective, i.e. number of iterations is typically small, at least if the
bound is large and if parameter a is not too close to 1.

I've attached a python3 script which implements the algorithm. It looks
like magic. Beware that a C implementation should take care of float and
int overflows.

# usage: a, #values, #tests

sh> zipf.py 1.5 1000 1000000
# after 1.7 seconds
c = [391586, 138668, 75525, 49339, 35222, 26621, ...
... 11, 13, 12, 11, 16] (1.338591 iterations per draw)

sh> zipf.py 1.1 1000 1000000
# after 3.1 seconds
c = [179302, 83927, 53104, 39015, 30557, 25164, ...
... 82, 95, 93, 81, 80] (2.681451 iterations per draw)

I think that this method should be used for a>1, and the other very rough
one can be kept for parameter a in [0, 1), a case which does not make much
sense to a mathematician as it diverges if unbounded.

--
Fabien.

Attachment	Content-Type	Size
zipf.py	text/x-python	1.6 KB

In response to

Re: [WIP] Zipfian distribution in pgbench at 2017-07-21 08:43:31 from Alik Khilazhev

Responses

Re: [WIP] Zipfian distribution in pgbench at 2017-08-06 18:22:26 from Alik Khilazhev
Re: [WIP] Zipfian distribution in pgbench at 2017-08-13 12:13:10 from Alik Khilazhev

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Fabien COELHO	2017-08-05 09:03:34	Re: Row Level Security Documentation
Previous Message	Michael Paquier	2017-08-05 08:14:11	Re: pg_stop_backup(wait_for_archive := true) on standby server