Re: rand48 replacement

From: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Aleksander Alekseev <aleksander(at)timescale(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: rand48 replacement
Date: 2021-07-04 11:30:41
Message-ID: CAEZATCXcmccPQismOioD5kr2NMyMoeLpDaesizzoKh8M-BK5Dw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, 4 Jul 2021 at 10:35, Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr> wrote:
>
> I did not understand why it is not correct.
>

Well, to make it easier to visualise, let's imagine our word size is
just 3 bits instead of 64 bits, and that the basic prng() function
generates numbers in the range [0,8). Similarly, imagine a splitmix3()
that operates on 3-bit values. So it might do something like this
(state offset and return values made up):

splitmix3(state):
state=0 -> 5, return 2
state=1 -> 6, return 5
state=2 -> 7, return 0
state=3 -> 0, return 3
state=4 -> 1, return 6
state=5 -> 2, return 1
state=6 -> 3, return 7
state=7 -> 4, return 4

Now suppose we want a random number in the range [0,6). This is what
happens with your algorithm for each of the possible prng() return
values:

prng() returns 0 -- OK
prng() returns 1 -- OK
prng() returns 2 -- OK
prng() returns 3 -- OK
prng() returns 4 -- OK
prng() returns 5 -- OK
prng() returns 6 -- out of range so use splitmix3() with initial state=6:
state=6 -> 3, return 7 -- still out of range, so repeat
state=3 -> 0, return 3 -- now OK
prng() returns 7 -- out of range so use splitmix3() with initial state=7:
state=7 -> 4, return 4 -- now OK

So, assuming that prng() chooses each of the 8 possible values with
equal probability (1/8), the overall result is that the values 0,1,2
and 5 are returned with a probability of 1/8, whereas 3 and 4 are
returned with a probability of 2/8.

Using the correct implementation of the bitmask algorithm, each
iteration calls prng() again, so in the end no particular return value
is ever more likely than any other (hence it's unbiased).

As for determinism, the end result is still fully deterministic. For
example, lets say that prng() returns the following sequence, for some
initial state:

1,0,3,0,3,7,4,7,6,6,5,3,7,7,7,0,3,6,5,2,3,3,4,0,0,2,7,4,...

then the bitmask method just returns that sequence with all the 6's
and 7's removed:

1,0,3,0,3,4,5,3,0,3,5,2,3,3,4,0,0,2,4,...

and that same sequence will always be returned, when starting from
that initial seed.

Regards,
Dean

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2021-07-04 12:00:37 Re: Mention --enable-tap-tests in the TAP section page
Previous Message Andrew Dunstan 2021-07-04 10:49:13 Re: Reducing the cycle time for CLOBBER_CACHE_ALWAYS buildfarm members