Re: rand48 replacement

From: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To: Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>
Cc: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Aleksander Alekseev <aleksander(at)timescale(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: rand48 replacement
Date: 2021-07-07 10:29:07
Message-ID: CAEZATCX2qmzW+mVgpFM1HWwP_7bu_PnPvJomKMC58u02Ls39+w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 7 Jul 2021 at 04:00, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru> wrote:
>
> Anyway, excuse me for heating this discussion cause of such
> non-essential issue.
> I'll try to control myself and don't proceed it further.
>

Whilst it has been interesting learning and discussing all these
different techniques, I think it's probably best to stick with the
bitmask method, rather than making the code too complex and difficult
to follow. The bitmask method has the advantage of being very simple,
easy to understand and fast (fastest in many of the benchmarks, and
close enough in others to make me think that the difference won't
matter for our purposes).

To test the current patch, I hacked up a simple SQL-callable server
function: random(bigint, bigint) returns bigint, similar to the one in
pgbench. After doing so, I couldn't help thinking that it would be
useful to have such a function in core, so maybe that could be a
follow-on patch. Anyway, that led to the following observations:

Firstly, there's a bug in the existing mask_u64() code -- if
pg_leftmost_one_pos64(u) returns 63, you end up with a mask equal to
0, and it breaks down.

Secondly, I think it would be simpler to implement this as a bitshift,
rather than a bitmask, using the high bits from the random number.
That might not make much difference for xoroshiro**, but in general,
PRNGs tend to be weaker in the lower bits, so it seems preferable on
that basis. But also, it makes the code simpler and less error-prone.

Finally, I think it would be better to treat the upper bound of the
range as inclusive. Doing so makes the function able to cover all
possible 64-bit ranges. It would then be easy (perhaps in another
follow-on patch) to make the pgbench random() function work for all
64-bit bounds (as long as max >= min), without the weird overflow
checking it currently has.

Putting those 3 things together, the code (minus comments) becomes:

if (range > 0)
{
int rshift = 63 - pg_leftmost_one_pos64(range);

do
{
val = xoroshiro128ss(state) >> rshift;
}
while (val > range);
}
else
val = 0;

which reduces the complexity a bit.

Regards,
Dean

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Yugo NAGATA 2021-07-07 10:53:23 Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
Previous Message Boris Kolpackov 2021-07-07 09:38:33 Re: Pipeline mode and PQpipelineSync()