Re: rand48 replacement

From: Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Aleksander Alekseev <aleksander(at)timescale(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: rand48 replacement
Date: 2021-07-05 06:36:27
Message-ID: 3136e2672d5ab395521bae05a80df469@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Fabien COELHO писал 2021-07-04 23:29:
>> The important property of determinism is that if I set a seed, and
>> then make an identical set of calls to the random API, the results
>> will be identical every time, so that it's possible to write tests
>> with predictable/repeatable results.
>
> Hmmm… I like my stronger determinism definition more than this one:-)
>
>>> I would work around that by deriving another 128 bit generator
>>> instead of splitmix 64 bit, but that is overkill.
>>
>> Not really relevant now, but I'm pretty sure that's impossible to do.
>> You might try it as an interesting academic exercise, but I believe
>> it's a logical impossibility.
>
> Hmmm… I was simply thinking of seeding a new pg_prng_state from the
> main pg_prng_state with some transformation, and then iterate over the
> second PRNG, pretty much like I did with splitmix, but with 128 bits
> so that your #states argument does not apply, i.e. something like:
>
> /* select in a range with bitmask rejection */
> uint64 pg_prng_u64_range(pg_prng_state *state, uint64 range)
> {
> /* always advance state once */
> uint64 next = xoroshiro128ss(state);
> uint64 val;
>
> if (range >= 2)
> {
> uint64 mask = mask_u64(range-1);
>
> val = next & mask;
>
> if (val >= range)
> {
> /* copy and update current prng state */
> pg_prng_state iterator = *state;
>
> iterator.s0 ^= next;
> iterator.s1 += UINT64CONST(0x9E3779B97f4A7C15);
>
> /* iterate till val in [0, range) */
> while ((val = xoroshiro128ss(&iterator) & mask) >= range)
> ;
> }
> }
> else
> val = 0;
>
> return val;
> }
>
> The initial pseudo-random sequence is left to proceed, and a new PRNG
> is basically forked for iterating on the mask, if needed.

I believe most "range" values are small, much smaller than UINT32_MAX.
In this case, according to [1] fastest method is Lemire's one (I'd take
original version from [2])

Therefore combined method pg_prng_u64_range could branch on range value

uint64 pg_prng_u64_range(pg_prng_state *state, uint64 range)
{
uint64 val = xoroshiro128ss(state);
uint64 m;
if ((range & (range-1) == 0) /* handle all power 2 cases */
return range != 0 ? val & (range-1) : 0;
if (likely(range < PG_UINT32_MAX/32)
{
/*
* Daniel Lamire's unbiased range random algorithm based on
rejection sampling
* https://lemire.me/blog/2016/06/30/fast-random-shuffling/
*/
m = (uint32)val * range;
if ((uint32)m < range)
{
uint32 t = -range % range;
while ((uint32)m < t)
m = (uint32)xoroshiro128ss(state) * range;
}
return m >> 32;
}
/* Apple's mask method */
m = mask_u64(range-1);
val &= m;
while (val >= range)
val = xoroshiro128ss(state) & m;
return val;
}

Mask method could also be faster when range is close to mask.
For example, fast check for "range is within 1/1024 to mask" is
range < (range/512 + (range&(range*2)))

And then method choose could like:
if (likely(range < UINT32_MAX/32 && range > (range/512 +
(range&(range*2)))))

But I don't know does additional condition worth difference or not.

[1] https://www.pcg-random.org/posts/bounded-rands.html
[2] https://lemire.me/blog/2016/06/30/fast-random-shuffling/

regards,
Sokolov Yura

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ronan Dunklau 2021-07-05 06:38:28 Re: Add proper planner support for ORDER BY / DISTINCT aggregates
Previous Message Masahiro Ikeda 2021-07-05 06:28:58 Re: Transactions involving multiple postgres foreign servers, take 2