Re: rand48 replacement

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Aleksander Alekseev <aleksander(at)timescale(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: rand48 replacement
Date: 2021-11-27 19:11:34
Message-ID: alpine.DEB.2.22.394.2111271951200.223291@pseudo
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Hello Tom,

Thanks for the feedback.

> +/* use Donald Knuth's LCG constants for default state */
>
> How did Knuth get into this? This algorithm is certainly not his,
> so why are those constants at all relevant?

They are not more nor less relevant than any other "random" constant. The
state needs a default initialization. The point of using DK's is that it
is somehow cannot be some specially crafted value which would have some
special property only know to the purveyor of the constant and could be
used by them to break the algorithm.

https://en.wikipedia.org/wiki/Dual_EC_DRBG

> * I could do without the stream-of-consciousness notes in pg_prng.c.
> I think what's appropriate is to say "We use thus-and-such a generator
> which is documented here", maybe with a line or two about its properties.

The stuff was really written essentially as a "why this" for the first
patch, and to prevent questions about "why not this other generator"
later, because it could never stop.

> * Function names like these convey practically nothing to readers:
>
> +extern int64 pg_prng_i64(pg_prng_state *state);
> +extern uint32 pg_prng_u32(pg_prng_state *state);
> +extern int32 pg_prng_i32(pg_prng_state *state);
> +extern double pg_prng_f64(pg_prng_state *state);
> +extern bool pg_prng_bool(pg_prng_state *state);

The intention is obviously "postgres pseudo-random number generator for
<type>". ISTM that it conveys (1) that it is a postgres-specific stuff,
(2) that it is a PRNG (which I find *MUCH* more informative than the
misleading statement that something is random when it is not, and it is
shorter) and (3) about the type it returns, because C does require
functions to have distinct names.

What would you suggest?

> and these functions' header comments add a grand total of zero bits
> of information.

Yes, probably. I do not like not to comment at all on a function.

> What someone generally wants to know first about a PRNG is (a) is it
> uniform and (b) what is the range of outputs, neither of which are
> specified anywhere.

ISTM (b) is suggested thanks to the type and (a) I'm not sure about a PRNG
which would claim not at least claim to be uniform. Non uniform PRNG are
usually built based on a uniform one.

What do you suggest as alternate names?

> +#define FIRST_BIT_MASK UINT64CONST(0x8000000000000000)
> +#define RIGHT_HALF_MASK UINT64CONST(0x00000000FFFFFFFF)
> +#define DMANTISSA_MASK UINT64CONST(0x000FFFFFFFFFFFFF)
>
> I'm not sure that these named constants are any more readable than
> writing the underlying constant, maybe less so --- in particular
> I think something based on (1<<52)-1 would be more appropriate for
> the float mantissa operations. We don't need RIGHT_HALF_MASK at
> all, the casts to uint32 or int32 will accomplish that just fine.

Yep. I did it for uniformity.

> BTW, why are we bothering with FIRST_BIT_MASK in the first place,
> rather than returning "v & 1" for pg_prng_bool?

Because some PRNG are very bad in the low bits, not xoroshiro stuff,
though.

> Is xoroshiro128ss less random in the low-order bits than the higher?
> If so, that would be a pretty important thing to document. If it's not,
> we shouldn't make the code look like it is.

Dunno. Why should we prefer low bits?

> + * select in a range with bitmask rejection.
>
> What is "bitmask rejection"? Is it actually important to callers?

No, it is important to understand how it does it. That is the name of the
technique which is implemented, which helps if you want to understand what
is going on by googling it. This point could be moved inside the function.

> I think this should be documented more like "Produce a random
> integer uniformly selected from the range [rmin, rmax)."

Sure.

--
Fabien.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2021-11-27 19:27:19 Re: SSL Tests for sslinfo extension
Previous Message Mark Dilger 2021-11-27 18:05:16 Re: Non-superuser subscription owners