Re: rand48 replacement

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: rand48 replacement
Date: 2021-05-24 11:28:07
Message-ID: alpine.DEB.2.22.394.2105241310350.165418@pseudo
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Hello Tomas,

>> I have given a go at proposing a replacement for rand48.
>
> So what is the motivation for replacing rand48? Speed, quality of produced
> random numbers, features rand48 can't provide, or what?

Speed can only be near rand48, see below. Quality (eg no trivial cycles,
does not fail too quickly on statistical tests) and soundness (the point
of generating 52-64 bits of data out of 48 bits, which means that only a
small part of the target space is covered, fails me). Also, I like having
a implementation independent interface (current rand48 tells the name of
the algorithm everywhere and the "uint16[3]" state type is hardcoded in
several places).

>> POSIX 1988 (?) rand48 is a LCG PRNG designed to generate 32 bits integers
>> or floats based on a 48 bits state on 16 or 32 bits architectures. LCG
>> cycles on the low bits, which can be quite annoying. Given that we run on
>> 64 bits architectures and that we need to generate 64 bits ints or doubles,
>> IMHO it makes very little sense to stick to that.
>>
>> We should (probably) want:
>>  - one reasonable default PRNG for all pg internal uses.
>>  - NOT to invent a new design!
>>  - something fast, close to rand48 (which basically does 2 arithmetic
>>    ops, so it is hard to compete)
>>    no need for something cryptographic though, which would imply slow
>>  - to produce 64 bits integers & doubles with a 52 bits mantissa,
>>    so state size > 64 bits.
>>  - a small state though, because we might generate quite a few of them
>>    for different purposes so state size <= 256 or even <= 128 bits
>>  - the state to be aligned to whatever => 128 bits
>>  - 64 bits operations for efficiency on modern architectures,
>>    but not 128 bits operations.
>>  - not to depend on special hardware for speed (eg MMX/SSE/AES).
>>  - not something with obvious known and relevant defects.
>>  - not something with "rights" attached.
>>
>> These constraints reduce drastically the available options from
>> https://en.wikipedia.org/wiki/List_of_random_number_generators
>>
>> The attached patch removes "rand48" and adds a "pg_prng" implementation
>> based on xoroshiro128ss, and replaces it everywhere. In pgbench, the non
>> portable double-relying code is replaced by hopefully portable ints. The
>> interface makes it easy to replace the underlying PRNG if something else is
>> desired.
>
> xoroshiro seems reasonable. How does it compare to rand48? Does it need much
> less/more state, is it faster/slower, etc.?

Basically any PRNG should be slower/comparable than rand48 because it only
does 2 arithmetic ops, you cannot do much less when trying to steer bits.
However because of the 16-bits unpacking/packing on 64 bits architecture
there is some room for additional useful ops, so in the end from the end
user the performance is only about 5% loweR.

State is 16 bytes vs 6 bytes for rand48. This is ok for generating 8 bytes
per round and is still quite small.

> I'd expect that it produces better random sequence, considering rand48
> is a LCG, which is fairly simple decades old design.

Yep, it does not cycle trivially on low bits compared to an LCG (eg odd ->
even -> odd -> even -> ...), e.g. if you have the bad idea to do "% 2" on
an LCG to extract a bool you just alternate.

To summarize:
- better software engineering
- similar speed (slightly slower)
- better statistical quality
- quite small state
- soundness

--
Fabien.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2021-05-24 11:39:16 Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY
Previous Message Tomas Vondra 2021-05-24 10:57:13 Re: rand48 replacement