Re: pgbench - add pseudo-random permutation function

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Hironobu SUZUKI <hironobu(at)interdb(dot)jp>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2019-02-14 21:17:53
Message-ID: alpine.DEB.2.21.1902142155050.20189@lancre
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Hello Andres,

>> +# PGAC_C_BUILTIN_CLZLL
>
> I think this has been partially superceded by
>
> commit 711bab1e4d19b5c9967328315a542d93386b1ac5
> Author: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
> Date: 2019-02-13 16:10:06 -0300

Indeed, the patch needs a rebase & conflit resolution. I'll do it. Later.

>> <para>
>> + Function <literal>pr_perm</literal> implements a pseudo-random permutation.
>> + It allows to mix the output of non uniform random functions so that
>> + values drawn more often are not trivially correlated.
>> + It permutes integers in [0, size) using a seed by applying rounds of
>> + simple invertible functions, similarly to an encryption function,
>> + although beware that it is not at all cryptographically secure.
>> + Compared to <literal>hash</literal> functions discussed above, the function
>> + ensures that a perfect permutation is applied: there are no collisions
>> + nor holes in the output values.
>> + Values outside the interval are interpreted modulo the size.
>> + The function errors if size is not positive.
>> + If no seed is provided, <literal>:default_seed</literal> is used.
>> + For a given size and seed, the function is fully deterministic: if two
>> + permutations on the same size must not be correlated, use distinct seeds
>> + as outlined in the previous example about hash functions.
>> + </para>
>
> This doesn't really explain why we want this in pgbench.

Who is "we"?

If someone runs non uniform tests, ie with random_exp/zipf/gauss, close
values are drawn with a similar frequency, thus correlated, inducing an
undeserved correlation at the page level (eg for read) and better
performance that would be the case if relative frequencies were not
correlated to key values.

So the function allows having more realistic non uniform test, whereas
currently we can only have non uniform test with very unrealistically
correlated values at the key level and possibly at the page level, meaning
non representative performances because of these induced bias.

This is under the assumption that pgbench should allow more realistic
performance test scenarii, which I believe is a desirable purpose. If
someone disagree with this purpose, then they would consider both non
uniform random functions and this proposed pseudo-random permutation
function as useless, as probably most other additions to pgbench.

--
Fabien.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message legrand legrand 2019-02-14 21:21:39 Re: Planning counters in pg_stat_statements (using pgss_store)
Previous Message Jacob Champion 2019-02-14 21:01:55 Re: libpq debug log