Re: rand48 replacement

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>
Cc: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Aleksander Alekseev <aleksander(at)timescale(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: rand48 replacement
Date: 2021-07-03 15:26:00
Message-ID: alpine.DEB.2.22.394.2107031643010.2359988@pseudo
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Hello Yura,

> 1. PostgreSQL source uses `uint64` and `uint32`, but not
> `uint64_t`/`uint32_t`
> 2. I don't see why pg_prng_state could not be `typedef uint64
> pg_prng_state[2];`

It could, but I do not see that as desirable. From an API design point of
view we want something clean and abstract, and for me a struct looks
better for that. It would be a struct with an array insided, I think that
the code is more readable by avoiding constant index accesses (s[0] vs
s0), we do not need actual indexes.

> 3. Then SamplerRandomState and pgbench RandomState could stay.
> Patch will be a lot shorter.

You mean "typedef pg_prng_state SamplerRandomState"? One point of the
patch is to have "one" standard PRNG commonly used where one is needed, so
I'd say we want the name to be used, hence the substitutions.

Also, I have a thing against objects being named "Random" which are not
random, which is highly misleading. A PRNG is purely deterministic.
Removing misleading names is also a benefit.

So If people want to keep the old name it can be done. But I see these
name changes as desirable.

> I don't like mix of semantic refactoring and syntactic refactoring in
> the same patch. While I could agree with replacing `SamplerRandomState
> => pg_prng_state`, I'd rather see it in separate commit. And that
> separate commit could contain transition: `typedef uint64
> pg_prng_state[2];` => `typedef struct { uint64 s0, s1 } pg_prng_state;`

I would tend to agree on principle, but separating in two phases here
looks pointless: why implementing a cleaner rand48 interface, which would
then NOT be the rand48 standard, just to upgrade it to something else in
the next commit? And the other path is as painfull and pointless.

So I think that the new feature better comes with its associated
refactoring which is an integral part of it.

> 4. There is no need in ReservoirStateData->randstate_initialized. There could
> be macros/function:
> `bool pg_prng_initiated(state) { return (state[0]|state[1]) != 0; }`

It would work for this peculiar implementation but not necessary for
others that we may want to substitute later, as it would mean either
breaking the interface or adding a boolean in the structure if there is no
special unintialized state that can be detected, which would impact memory
usage and alignment.

So I think it is better to keep it that way, usually the user knows
whether its structure has been initialized, and the special case for
reservoir where the user does not seem to know about that can handle its
own boolean without impacting the common API or the data structure.

> 5. Is there need for 128 bit prng at all?

This is a 64 bits PRNG with a 128 bit state. We are generating 64 bits
values, so we want a 64 bit PRNG. A prng state must be larger than its
generated value, so we need sizeof(state) > 64 bits, hence at least 128
bits if we add 128 bits memory alignement.

> And there is 4*32bit xoshiro128:
> https://prng.di.unimi.it/xoshiro128plusplus.c
> 32bit operations are faster on 32bit platforms.
> But 32bit platforms are quite rare in production this days.
> Therefore I don't have strong opinion on this.

I think that 99.9% hardware running postgres is 64 bits, so 64 bits is the
right choice.

--
Fabien.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2021-07-03 15:36:02 Re: rand48 replacement
Previous Message Tom Lane 2021-07-03 15:23:56 Re: Preventing abort() and exit() calls in libpq