Re: [WIP] Zipfian distribution in pgbench

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Alik Khilazhev <a(dot)khilazhev(at)postgrespro(dot)ru>
Cc: PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [WIP] Zipfian distribution in pgbench
Date: 2017-07-19 14:21:39
Message-ID: alpine.DEB.2.20.1707191616140.4496@lancre
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Hello Alik,

> I am attaching patch v3.

> Among other things I fixed small typo in description of
> random_exponential function in pgbench.sgml file.

Ok. Probably this typo should be committed separatly and independently.

A few comments about v3:

Patch applies cleanly, compiles, works.

About the maths: As already said, I'm not at ease with a random_zipfian
function which does not display a (good) zipfian distribution. At the
minimum the documentation should be clear about the approximations implied
depending on the parameter value.

In the litterature the theta parameter seems to be often called alpha
or s (eg see https://en.wikipedia.org/wiki/Zipf%27s_law). I would suggest to
stick to "s" instead of "theta"?

About the code: looks simpler than the previous version, which is good.

Double constants should be used when the underlying type is a double,
instead of relying on implicit int to double promotion (0 -> 0.0, 1 -> 1.0).

Functions zipfZeta(n, theta) does not really computes the zeta(n) function,
so I think that a better name should be chosen. It seems to compute
H_{n,theta}, the generalized harmonic number. Idem "thetan" field in struct.

The handling of cache overflow by randomly removing one entry looks like
a strange idea. Rather remove the oldest entry?

ISTM that it should print a warning once if the cache array overflows as
performance would drop heavily.

If the zipf cache is constant size, there is no point in using dynamic
allocation, just declare an array...

Comments need updating: eg "theta parameter of previous execution" which
dates back when there was only one value.

There should be a comment to explain that the structure is in the thread
for thread safety.

There should be non regression tests somehow. If the "improve pgbench
tap test infrastructure" get through, things should be added there.

--
Fabien.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2017-07-19 14:36:51 Re: Bug in ExecModifyTable function and trigger issues for foreign tables
Previous Message Alexander Kuzmenkov 2017-07-19 13:42:06 Re: Proposal for CSN based snapshots