Re: [WIP] Zipfian distribution in pgbench

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Alik Khilazhev <a(dot)khilazhev(at)postgrespro(dot)ru>
Cc: PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [WIP] Zipfian distribution in pgbench
Date: 2017-08-05 08:15:05
Message-ID: alpine.DEB.2.20.1708050950220.16395@lancre
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Hello Alik,

>> So I would be in favor of expanding the documentation but not
>> restricting the parameter beyond avoiding value 1.0.
>
> I have removed restriction and expanded documentation in attaching patch v5.

I've done some math investigations, which consisted in spending one hour
with Christian, a statistician colleague of mine. He took an old book out
of a shelf, opened it to page 550 (roughly in the middle), and explained
to me how to build a real zipfian distribution random generator.

The iterative method is for parameter a>1 and works for unbounded values.
It is simple to add a bound. In practice the iterative method is quite
effective, i.e. number of iterations is typically small, at least if the
bound is large and if parameter a is not too close to 1.

I've attached a python3 script which implements the algorithm. It looks
like magic. Beware that a C implementation should take care of float and
int overflows.

# usage: a, #values, #tests

sh> zipf.py 1.5 1000 1000000
# after 1.7 seconds
c = [391586, 138668, 75525, 49339, 35222, 26621, ...
... 11, 13, 12, 11, 16] (1.338591 iterations per draw)

sh> zipf.py 1.1 1000 1000000
# after 3.1 seconds
c = [179302, 83927, 53104, 39015, 30557, 25164, ...
... 82, 95, 93, 81, 80] (2.681451 iterations per draw)

I think that this method should be used for a>1, and the other very rough
one can be kept for parameter a in [0, 1), a case which does not make much
sense to a mathematician as it diverges if unbounded.

--
Fabien.

Attachment Content-Type Size
zipf.py text/x-python 1.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2017-08-05 09:03:34 Re: Row Level Security Documentation
Previous Message Michael Paquier 2017-08-05 08:14:11 Re: pg_stop_backup(wait_for_archive := true) on standby server