From: | Hironobu SUZUKI <hironobu(at)interdb(dot)jp> |
---|---|

To: | Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |

Subject: | Re: pgbench - add pseudo-random permutation function |

Date: | 2018-09-09 22:35:30 |

Message-ID: | f30bc717-5532-c2d8-e513-b4912c10ddbe@interdb.jp |

Views: | Raw Message | Whole Thread | Download mbox |

Thread: | |

Lists: | pgsql-hackers |

Hi Fabian,

I reviewed `pgbench-prp-func-1.patch` and found an incomplete

implementation.

In the pseudorandom_perm function, I confirmed that the scramble and

scatter operations are mathematically bijections. Therefore, this

function is mathematically correct.

However, the implementation of the scatter operation in this patch

overflows in many cases if the variable:size is 38 bit integer or

greater. Because the variable:size and the item of the array:primes[]

which stores 27-29 bit integers are multiplicated. If overflow occurs,

the scatter operation does not satisfy bijective.

I did a sampling inspection, whose conditions are the variable:size is

1099511627773 (= 40 bit integer) and the variable:seed is 5432. Even

with very few samples, I found a huge number of collisions as shown below:

pr_perm(409749816, 1099511627773, 5432) = pr_perm(491041141,

1099511627773, 5432) = pr_perm(729171766, 1099511627773, 5432) =

pr_perm(849775914, 1099511627773, 5432) = 22445482629,

pr_perm(45609644, 1099511627773, 5432) = pr_perm(174005122,

1099511627773, 5432) = pr_perm(811754941, 1099511627773, 5432) =

pr_perm( 1131176891, 1099511627773, 5432) = 10626826534,

and so on.

There are two ways to resolve this issue. The first one is to reduce the

maximum value can be set for the variable:size. The second one is to add

a modular multiplication function to avoid overflow. I made a patch

including the function that can be calculated modular multiplication

without overflow, and attached this mail.

(I also attached a simple test suite of the function I added.)

In the others parts of the original patch, I could apply the patch and

did tests without trouble; the documentations and comments are well

except one comment in the function shown below:

>> (1) scramble: partial xors on power-or-2 subsets

I could not understand this meaning when I read it at the first time.

Best regards,

On 2018/07/28 15:03, Fabien COELHO wrote:

>

> Hello,

>

> This patch adds a pseudo-random permutation function to pgbench. It

> allows to mix non uniform random keys to avoid trivial correlations

> between neighboring values, hence between pages.

>

> The function is a simplistic form of encryption adapted to any size,

> using a few iterations of scramble and scatter phases. The result is not

> cryptographically convincing, nor even statistically, but it is quite

> inexpensive and achieves the desired result. A computation costs 0.22 µs

> per call on my laptop, about three times the cost of a simple function.

>

> Alternative designs, such as iterating over an actual encryption

> function or using some sbox, would lead to much more costly solutions

> and complex code.

>

> I also join a few scripts I used for testing.

>

Attachment | Content-Type | Size |
---|---|---|

pgbench-prp-func-2.patch | text/plain | 13.0 KB |

modular_multiplicate.c | text/plain | 2.6 KB |

modular_multiplicate.dat | text/plain | 1.8 KB |

modular_multiplicate_test.sh | application/x-sh | 416 bytes |

- pgbench - add pseudo-random permutation function at 2018-07-28 14:03:49 from Fabien COELHO

- Re: pgbench - add pseudo-random permutation function at 2018-09-11 14:57:33 from Fabien COELHO
- Re: pgbench - add pseudo-random permutation function at 2018-09-12 09:26:53 from Fabien COELHO

From | Date | Subject | |
---|---|---|---|

Next Message | Higuchi, Daisuke | 2018-09-10 02:01:53 | stat() on Windows might cause error if target file is larger than 4GB |

Previous Message | Tomas Vondra | 2018-09-09 21:49:59 | Re: [HACKERS] PATCH: multivariate histograms and MCV lists |