From: | Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr> |
---|---|

To: | Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> |

Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Aleksander Alekseev <aleksander(at)timescale(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |

Subject: | Re: rand48 replacement |

Date: | 2021-07-04 16:03:56 |

Message-ID: | alpine.DEB.2.22.394.2107041718570.2359988@pseudo |

Views: | Raw Message | Whole Thread | Download mbox | Resend email |

Thread: | |

Lists: | pgsql-hackers |

> Now suppose we want a random number in the range [0,6). This is what

> happens with your algorithm for each of the possible prng() return

> values:

>

> prng() returns 0 -- OK

> prng() returns 1 -- OK

> prng() returns 2 -- OK

> prng() returns 3 -- OK

> prng() returns 4 -- OK

> prng() returns 5 -- OK

> prng() returns 6 -- out of range so use splitmix3() with initial state=6:

> state=6 -> 3, return 7 -- still out of range, so repeat

> state=3 -> 0, return 3 -- now OK

> prng() returns 7 -- out of range so use splitmix3() with initial state=7:

> state=7 -> 4, return 4 -- now OK

>

> So, assuming that prng() chooses each of the 8 possible values with

> equal probability (1/8), the overall result is that the values 0,1,2

> and 5 are returned with a probability of 1/8, whereas 3 and 4 are

> returned with a probability of 2/8.

Ok, I got that explanation.

> Using the correct implementation of the bitmask algorithm, each

> iteration calls prng() again, so in the end no particular return value

> is ever more likely than any other (hence it's unbiased).

Ok, you're taking into account the number of states of the PRNG, so this

finite number implies some bias on some values if you actually enumerate

all possible cases, as you do above.

I was reasoning "as if" the splitmix PRNG was an actual random function,

which is obviously false, but is also somehow a usual (false) assumption

with PRNGs, and with this false assumption my implementation is perfect:-)

The defect of the modulo method for range extraction is that even with an

actual (real) random generator the results would be biased. The bias is in

the method itself. Now you are arguing for a bias linked to the internals

of the PRNG. They are not the same in nature, even if the effect is the

same.

Also the bias is significant for close to the limit ranges, which is not

the kind of use case I have in mind when looking at pgbench.

If you consider the PRNG internals, then splitmix extraction function

could also be taken into account. If it is not invertible (I'm unsure),

then assuming it is some kind of hash function, about 1/e of output values

would not reachable, which is yet another bias that you could argue

against.

Using the initial PRNG works better only because the underlying 128 bits

state is much larger than the output value. Which is the point for having

a larger state in the first place, anyway.

> As for determinism, the end result is still fully deterministic. For

> example, lets say that prng() returns the following sequence, for some

> initial state:

>

> 1,0,3,0,3,7,4,7,6,6,5,3,7,7,7,0,3,6,5,2,3,3,4,0,0,2,7,4,...

>

> then the bitmask method just returns that sequence with all the 6's

> and 7's removed:

>

> 1,0,3,0,3,4,5,3,0,3,5,2,3,3,4,0,0,2,4,...

>

> and that same sequence will always be returned, when starting from

> that initial seed.

Yes and no.

The result is indeed deterministic of you call the function with the same

range. However, if you change the range value in one place then sometimes

the state can advance differently, so the determinism is lost, meaning

that it depends on actual range values.

Attached a v7 which does as you wish, but also looses the deterministic

non-value dependent property I was seeking. I would work around that by

deriving another 128 bit generator instead of splitmix 64 bit, but that is

overkill.

--

Fabien.

Attachment | Content-Type | Size |
---|---|---|

prng-7.patch | text/x-diff | 47.4 KB |

- Re: rand48 replacement at 2021-07-04 11:30:41 from Dean Rasheed

- Re: rand48 replacement at 2021-07-04 17:35:33 from Dean Rasheed

From | Date | Subject | |
---|---|---|---|

Next Message | Dean Rasheed | 2021-07-04 17:35:33 | Re: rand48 replacement |

Previous Message | Tom Lane | 2021-07-04 15:58:27 | Re: PostgreSQL-13.3 parser.y with positional references by named references |