Re: rand48 replacement

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Aleksander Alekseev <aleksander(at)timescale(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: rand48 replacement
Date: 2021-07-04 09:35:09
Message-ID: alpine.DEB.2.22.394.2107041115130.2359988@pseudo
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Hello Dean,

>> - moves the stuff to common and fully removes random/srandom (Tom)
>> - includes a range generation function based on the bitmask method (Dean)
>> but iterates with splitmix so that the state always advances once (Me)
>
> At the risk of repeating myself: do *not* invent your own scheme.

> The problem with iterating using splitmix is that splitmix is a simple
> shuffling function that takes a single input and returns a mutated
> output depending only on that input.

It also iterates over its 64 bits state in a round robin fashion so that
the cycle size is 2^64 (it is a simple add).

> So let's say for simplicity that you're generating numbers in the range
> [0,N) with N=2^64-n and n<2^63. Each of the n values in [N,2^64) that
> lie outside the range wanted are just mapped in a deterministic way back
> onto (at most) n values in the range [0,N), making those n values twice
> as likely to be chosen as the other N-n values.

I do understand your point. If the value is outside the range, splitmix
iterates over its seed and the extraction functions produces a new number
which is tested again. I do not understand the "mapped back onto" part,
the out of range value is just discarded and the loops starts over with a
new derivation, and why it would imply that some values are more likely to
come out.

> So what you've just invented is an algorithm with the complexity of the
> unbiased bitmask method,

That is what I am trying to implement.

> but basically the same bias as the biased integer multiply method.

I did not understand.

> I don't understand why you object to advancing the state more than
> once. Doing so doesn't make the resulting sequence of numbers any less
> deterministic.

It does, somehow, hence my struggle to try to avoid it.

call seed(0xdeadbeef);
x1 = somepseudorand();
x2 = somepseudorand();
x3 = somepsuedorand();

I think we should want x3 to be the same result whatever the previous
calls to the API.

> In fact, I'm pretty sure you *have to* advance the state more than
> once in *any* unbiased scheme. That's a common characteristic of all
> the unbiased methods I've seen, and intuitively, I think it has to be
> so.

Yes and no. We can advance another state seeded by the root prng.

> Otherwise, I'm happy with the use of the bitmask method, as long as
> it's implemented correctly.

I did not understand why it is not correct.

--
Fabien.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2021-07-04 10:38:33 Re: Update maintenance_work_mem/autovacuum_work_mem to reflect the 1GB limitation with VACUUM
Previous Message David Rowley 2021-07-04 08:53:06 Re: Remove useless int64 range checks on BIGINT sequence MINVALUE/MAXVALUE values