Re: rand48 replacement

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Aleksander Alekseev <aleksander(at)timescale(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: rand48 replacement
Date: 2021-07-02 21:51:42
Message-ID: alpine.DEB.2.22.394.2107022340250.2359988@pseudo
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Hello Dean,

> It may be true that the bias is of the same magnitude as FP multiply,
> but it is not of the same nature. With FP multiply, the
> more-likely-to-be-chosen values are more-or-less evenly distributed
> across the range, whereas modulo concentrates them all at one end,
> making it more likely to bias test results.

Yes, that is true.

> It's worth paying attention to how other languages/libraries implement
> this, and basically no one chooses the modulo method, which ought to
> raise alarm bells. Of the biased methods, it has the worst kind of
> bias and the worst performance.

Hmmm. That is not exactly how I interpreted the figures in the blog.

> If a biased method is OK, then the biased integer multiply method
> seems to be the clear winner. This requires the high part of a
> 64x64-bit product, which is trivial if 128-bit integers are available,
> but would need a little more work otherwise. There's some code in
> common/d2s that might be suitable.

And yes, modulo is expensive. If we allow 128 bits integers operations, I
would not choose this RNPG in the first place, I'd take PCG with a 128 bits state.
That does not change the discussion about bias, though.

> Most other implementations tend to use an unbiased method though, and I
> think it's worth doing the same. It might be a bit slower, or even
> faster depending on implementation and platform, but in the context of
> the DB as a whole, I don't think a few extra cycles matters either way.

Ok ok ok, I surrender!

> The method recommended at the very end of that blog seems to be pretty
> good all round, but any of the other commonly used unbiased methods
> would probably be OK too.

That does not address my other issues with the proposed methods, in
particular the fact that the generated sequence is less deterministic, but
I think I have a simple way around that. I'm hesitating to skip to the the
bitmask method, and give up performance uniformity. I'll try to come up
with something over the week-end, and also address Tom's comments in
passing.

--
Fabien.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jacob Champion 2021-07-02 22:15:45 Re: Preventing abort() and exit() calls in libpq
Previous Message Paul A Jungwirth 2021-07-02 21:39:50 Re: SQL:2011 application time