Re: random() (was Re: New GUC to sample log queries)

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Adrien Nayrat <adrien(dot)nayrat(at)anayrat(dot)info>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, vik(dot)fearing(at)2ndquadrant(dot)com, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: random() (was Re: New GUC to sample log queries)
Date: 2018-12-27 23:00:02
Message-ID: 17235.1545951602@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr> writes:
>> Another idea, which would be a lot less prone to breakage by
>> add-on code, is to change drandom() and setseed() to themselves
>> use pg_erand48() with a private seed.

> The pg_erand48 code looks like crumbs from the 70's optimized for 16 bits
> architectures (which it is probably not, but why not going to 64 bits or
> 128 bits directly looks like a missed opportunity), its internal state is
> 48 bits as its name implies, and its period probably around 2**48, which
> is 2**12 better than the previous case, not an extraordinary achievement.

I can't get terribly excited about rewriting that code. You're arguing
from a "one size should fit all" perspective, which is exactly not the
design approach we're using. We've already converted security-sensitive
PRNG uses to use pg_strong_random (or if we haven't, that's a localized
bug in any such places we missed). What remains are places where we are
not so concerned about cryptographic strength but rather speed. It does
behoove us to ensure that the seed values are unpredictable and that a
user-controllable seed isn't used for internal operations, but I don't
feel a need for replacing the algorithm.

You might argue that the SQL function drandom should be held to a higher
standard, but we document exactly this same tradeoff for it. Users who
want cryptographic strength are directed to pgcrypto, leaving the audience
for drandom being people who want speed.

I do notice a couple of things that could be improved about erand48.c:

1. The _rand48_mult and _rand48_add values are constants, but probably few
compilers would notice that, given that they're assigned to in pg_srand48.
I think we should replace the references to those variables with direct
uses of the macros for their values, to save a few cycles in _dorand48.

2. There seems to be a bug in pg_jrand48 (which is relatively new, added
in fe0a0b599, without bothering to update the header comment).
Per POSIX,

The mrand48() and jrand48() functions return signed long integers
uniformly distributed over the interval [-2^31, 2^31).

However, if "long" is 64 bits what you're actually going to get is
unsigned values ranging up to 2^32-1. This doesn't matter to existing
callers because they coerce the value to int32 or uint32, but it's going
to bite somebody eventually.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2018-12-27 23:00:03 Re: Poor buildfarm coverage of strong-random alternatives
Previous Message Michael Paquier 2018-12-27 22:56:54 Re: removal of dangling temp tables