Re: [WIP] Zipfian distribution in pgbench

From: Alik Khilazhev <a(dot)khilazhev(at)postgrespro(dot)ru>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [WIP] Zipfian distribution in pgbench
Date: 2017-07-20 16:49:01
Message-ID: 9F00A6CA-EA92-4DF2-9007-652AFE3636D2@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello Fabien,

I am attaching patch v4.

> On 19 Jul 2017, at 17:21, Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr> wrote:
>
> About the maths: As already said, I'm not at ease with a random_zipfian function which does not display a (good) zipfian distribution. At the minimum the documentation should be clear about the approximations implied depending on the parameter value.

I add one more sentence to documentation to emphasize that degree of proximity depends on parameter . And also I made restriction on parameter, now it can be only in range (0; 1)

> In the litterature the theta parameter seems to be often called alpha
> or s (eg see https://en.wikipedia.org/wiki/Zipf%27s_law). I would suggest to
> stick to "s" instead of "theta”?

I have renamed it to “s”.

> Functions zipfZeta(n, theta) does not really computes the zeta(n) function,
> so I think that a better name should be chosen. It seems to compute
> H_{n,theta}, the generalized harmonic number. Idem "thetan" field in struct.

Renamed zipfZeta to zipfGeneralizedHarmonic, zetan to harmonicn.

> The handling of cache overflow by randomly removing one entry looks like
> a strange idea. Rather remove the oldest entry?

Replaced with Least Recently Used replacement algorithm.

> ISTM that it should print a warning once if the cache array overflows as performance would drop heavily.

Now it prints warning message if array overflowed. To print message only one time, it uses global flag, which is available for all threads.
And theoretically message can be printed more than one time.
It could be solved easily using pg_atomic_test_set_flag() from src/include/port/atomics.h but it can not be used in pgbench because of following lines of code there:

#ifdef FRONTEND
#error "atomics.h may not be included from frontend code"
#endif

Or it can be fixed by using mutexes from pthread, but I think code become less readable and more complex in this case.
So, should I spend time on solving this issue?

> If the zipf cache is constant size, there is no point in using dynamic allocation, just declare an array…

Fixed. Does ZIPF_CACHE_SIZE = 15 is ok?

> There should be non regression tests somehow. If the "improve pgbench
> tap test infrastructure" get through, things should be added there.

I will send tests later, as separate patch.


Thanks and Regards,
Alik Khilazhev
Postgres Professional:
http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Janes 2017-07-20 16:52:09 Re: Increase Vacuum ring buffer.
Previous Message Alvaro Herrera 2017-07-20 16:26:23 Re: Cache lookup errors with functions manipulation object addresses