|From:||Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>|
|To:||Alik Khilazhev <a(dot)khilazhev(at)postgrespro(dot)ru>|
|Cc:||PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>|
|Subject:||Re: [WIP] Zipfian distribution in pgbench|
|Views:||Raw Message | Whole Thread | Download mbox | Resend email|
>> So I would be in favor of expanding the documentation but not
>> restricting the parameter beyond avoiding value 1.0.
> I have removed restriction and expanded documentation in attaching patch v5.
I've done some math investigations, which consisted in spending one hour
with Christian, a statistician colleague of mine. He took an old book out
of a shelf, opened it to page 550 (roughly in the middle), and explained
to me how to build a real zipfian distribution random generator.
The iterative method is for parameter a>1 and works for unbounded values.
It is simple to add a bound. In practice the iterative method is quite
effective, i.e. number of iterations is typically small, at least if the
bound is large and if parameter a is not too close to 1.
I've attached a python3 script which implements the algorithm. It looks
like magic. Beware that a C implementation should take care of float and
# usage: a, #values, #tests
sh> zipf.py 1.5 1000 1000000
# after 1.7 seconds
c = [391586, 138668, 75525, 49339, 35222, 26621, ...
... 11, 13, 12, 11, 16] (1.338591 iterations per draw)
sh> zipf.py 1.1 1000 1000000
# after 3.1 seconds
c = [179302, 83927, 53104, 39015, 30557, 25164, ...
... 82, 95, 93, 81, 80] (2.681451 iterations per draw)
I think that this method should be used for a>1, and the other very rough
one can be kept for parameter a in [0, 1), a case which does not make much
sense to a mathematician as it diverges if unbounded.
|Next Message||Fabien COELHO||2017-08-05 09:03:34||Re: Row Level Security Documentation|
|Previous Message||Michael Paquier||2017-08-05 08:14:11||Re: pg_stop_backup(wait_for_archive := true) on standby server|