Re: pgbench - add pseudo-random permutation function

From: Hironobu SUZUKI <hironobu(at)interdb(dot)jp>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2018-09-09 22:35:30
Message-ID: f30bc717-5532-c2d8-e513-b4912c10ddbe@interdb.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Fabian,

I reviewed `pgbench-prp-func-1.patch` and found an incomplete
implementation.

In the pseudorandom_perm function, I confirmed that the scramble and
scatter operations are mathematically bijections. Therefore, this
function is mathematically correct.

However, the implementation of the scatter operation in this patch
overflows in many cases if the variable:size is 38 bit integer or
greater. Because the variable:size and the item of the array:primes[]
which stores 27-29 bit integers are multiplicated. If overflow occurs,
the scatter operation does not satisfy bijective.

I did a sampling inspection, whose conditions are the variable:size is
1099511627773 (= 40 bit integer) and the variable:seed is 5432. Even
with very few samples, I found a huge number of collisions as shown below:

pr_perm(409749816, 1099511627773, 5432) = pr_perm(491041141,
1099511627773, 5432) = pr_perm(729171766, 1099511627773, 5432) =
pr_perm(849775914, 1099511627773, 5432) = 22445482629,
pr_perm(45609644, 1099511627773, 5432) = pr_perm(174005122,
1099511627773, 5432) = pr_perm(811754941, 1099511627773, 5432) =
pr_perm( 1131176891, 1099511627773, 5432) = 10626826534,
and so on.

There are two ways to resolve this issue. The first one is to reduce the
maximum value can be set for the variable:size. The second one is to add
a modular multiplication function to avoid overflow. I made a patch
including the function that can be calculated modular multiplication
without overflow, and attached this mail.
(I also attached a simple test suite of the function I added.)

In the others parts of the original patch, I could apply the patch and
did tests without trouble; the documentations and comments are well
except one comment in the function shown below:

>> (1) scramble: partial xors on power-or-2 subsets

I could not understand this meaning when I read it at the first time.

Best regards,

On 2018/07/28 15:03, Fabien COELHO wrote:
>
> Hello,
>
> This patch adds a pseudo-random permutation function to pgbench. It
> allows to mix non uniform random keys to avoid trivial correlations
> between neighboring values, hence between pages.
>
> The function is a simplistic form of encryption adapted to any size,
> using a few iterations of scramble and scatter phases. The result is not
> cryptographically convincing, nor even statistically, but it is quite
> inexpensive and achieves the desired result. A computation costs 0.22 µs
> per call on my laptop, about three times the cost of a simple function.
>
> Alternative designs, such as iterating over an actual encryption
> function or using some sbox, would lead to much more costly solutions
> and complex code.
>
> I also join a few scripts I used for testing.
>

Attachment Content-Type Size
pgbench-prp-func-2.patch text/plain 13.0 KB
modular_multiplicate.c text/plain 2.6 KB
modular_multiplicate.dat text/plain 1.8 KB
modular_multiplicate_test.sh application/x-sh 416 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Higuchi, Daisuke 2018-09-10 02:01:53 stat() on Windows might cause error if target file is larger than 4GB
Previous Message Tomas Vondra 2018-09-09 21:49:59 Re: [HACKERS] PATCH: multivariate histograms and MCV lists