Re: pgsql: Use a separate random seed for SQL random()/setseed() functions.

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-committers(at)lists(dot)postgresql(dot)org, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgsql: Use a separate random seed for SQL random()/setseed() functions.
Date: 2018-12-30 10:06:52
Message-ID: alpine.DEB.2.21.1812300824140.26970@lancre
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers


Hello Tom,

I'm sorry I'm a bit late back into this discussion, I was on the road.

> To fix this reliably, we need random() and setseed() to use their own
> private random state variable.

Ok.

> Standard random(3) isn't amenable to such usage, so let's switch to
> pg_erand48().

Hmmm… bad idea?

> It's hard to say whether that's more or less "random" than any
> particular platform's version of random(3),

It looks much less pseudo-random on Linux: POSIX provides 3 pseudo-random
functions (probably 2 too many): "random", "rand" (mapped on the previous
one by glibc), and ".rand48". According to glibc documentation, "random"
has an internal state of 31 long integers i.e. 992 bits (checked into the
source code, although why it can only be seeded from 32 bits fails me)
with a "nonlinear additive feedback" PRNG, vs 48 bits of rand48 linear
congruential generator PRNG, also seeded from 32 bits.

For me, a 48 bit state is inadequate for anything but a toy application
that would need a few casual pseudo-random numbers. I recommand against
using the rand48 for a backend purpose which might have any, even remote,
security implication.

As rand48 is a LCG, it cycles on the low-order bits so that they should
not be used (although they are for erand48). The 48 bit state looks like a
80's design, when hardware was between 16 and 32 bit, and was still in use
in the 90's so that it is in POSIX and Java. I can retro-explain it as
follows: the aim was to produce reasonable 32 bit pseudo-random ints on
slow machines while not using low-order bits, so 48 was the closest
round-up possible. Why not go up to 64 bits was very probably because it
would have required more expensive mults to simulate 64 bit multiply on a
16 or 32 bit architecture. The 48-bit LCG makes it "good enough" for less
than a cubic root of size samples, i.e. 2**16 draws. This is much too
small on today GHz hardware.

ISTM that 64 bits would be on the too-low side as well. I'd shop for a 128
or 256-bit state generator. I'm unsure of the best choice, though. I have
looked at "xorshift128+" and "xoshiro256**", which have some critics
(basically, non-cryptographic PRNG can have their state rebuilt from a few
outputs, and there usually is a simple transformation on outputs which
make it fails statistical tests). ISTM that "xoshiro256**" would be a
reasonable choice, much better than "rand48". An LCG with a larger state
(>= 128) could be admissible as well.

> but it does have a wider seed value and a longer period than are required
> by POSIX, so we can hope that this isn't a big downgrade.

I'd say that it is a significant downgrade that I wish postgres woud
avoid, especially with the argument that it for better security!

I'd suggest again that (1) postgres should provide an
algorithm-independent interface to its PRNG with an external state and (2)
use an alternative to rand48, the choice of which should be discussed.

--
Fabien.

In response to

Browse pgsql-committers by date

  From Date Subject
Next Message Tom Lane 2018-12-30 18:42:11 pgsql: Teach eval_const_expressions to constant-fold LEAST/GREATEST exp
Previous Message Michael Paquier 2018-12-30 05:36:04 pgsql: Trigger stmt_beg and stmt_end for top-level statement blocks of

Browse pgsql-hackers by date

  From Date Subject
Next Message James Coleman 2018-12-30 14:29:43 Re: Using Btree to Provide Sorting on Suffix Keys with LIMIT
Previous Message Vik Fearing 2018-12-30 08:31:48 Re: Optimize constant MinMax expressions