Re: random() (was Re: New GUC to sample log queries)

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Adrien Nayrat <adrien(dot)nayrat(at)anayrat(dot)info>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, vik(dot)fearing(at)2ndquadrant(dot)com, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: random() (was Re: New GUC to sample log queries)
Date: 2018-12-28 08:18:37
Message-ID: alpine.DEB.2.21.1812280808490.32444@lancre
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Hello Tom,

>>> Another idea, which would be a lot less prone to breakage by
>>> add-on code, is to change drandom() and setseed() to themselves
>>> use pg_erand48() with a private seed.
>
>> The pg_erand48 code looks like crumbs from the 70's optimized for 16 bits
>> architectures (which it is probably not, but why not going to 64 bits or
>> 128 bits directly looks like a missed opportunity), its internal state is
>> 48 bits as its name implies, and its period probably around 2**48, which
>> is 2**12 better than the previous case, not an extraordinary achievement.
>
> I can't get terribly excited about rewriting that code. You're arguing
> from a "one size should fit all" perspective,

Not exactly. I'm not claiming that distinguishing parts that require
good random from others is a bad choice. I'm arguing that:

(1) from a software engineering perspective, the PRNG implementation
should hide the underlying generator name and state size.

(2) from a numerical perspective, poor seedings practice should be avoided
when possible.

(3) from a cryptographic perspective, LCG is a poor choice of fast PRNG,
which a quick look at wikipedia (mother of all knowledge) confirms.

(4) the fact that pg historical PRNG choices are well documented is not a
good justification for not improving them.

Better alternatives exist that do not cost much (eg xorshift variants,
WELL...), some of which are optimized for 64 bits architectures.

> which is exactly not the design approach we're using.

> We've already converted security-sensitive PRNG uses to use
> pg_strong_random (or if we haven't, that's a localized bug in any such
> places we missed). What remains are places where we are not so
> concerned about cryptographic strength but rather speed.

I do agree with the speed concern. I'm arguing that better quality at
speed can be achieved with better seeding practices and not using a LCG.

About costs, not counting array accesses:

- lrand48 (48 bits state as 3 uint16) is 29 ops
(10 =, 8 *, 7 +, 4 >>)
- xorshift+ (128 bits state as 2 uint64) is 13 ops
( 5 =, 0 *, 1 +, 3 >>, 4 ^)
- xororshift128+ (idem) is 17 ops
( 6 =, 0 *, 1 +, 5 >>, 3 ^, 2 |, less if rot in hardware)
- WELL512 (512 bits state as 16 uint32) is 38 ops
(11 =, 0 *, 3 +, 7 >>, 10 ^, 4 &)
probably much better, but probably slower than the current version

I'd be of the (debatable) opinion that we could use xororshift128+,
already used by various languages, even if it fails some specialized
tests.

> It does behoove us to ensure that the seed values are unpredictable and
> that a user-controllable seed isn't used for internal operations,

Sure.

> but I don't feel a need for replacing the algorithm.

Hmmm. Does it mean that you would veto any change, even if the speed
concern is addressed (i.e. faster/not slower with better quality)?

> You might argue that the SQL function drandom should be held to a higher
> standard, but we document exactly this same tradeoff for it. Users who
> want cryptographic strength are directed to pgcrypto, leaving the audience
> for drandom being people who want speed.

My point is that speed is not necessary incompatible with better quality.
Better quality should not replace strong random when needed.

--
Fabien.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2018-12-28 08:28:11 Re: postgres_fdw: oddity in costing aggregate pushdown paths
Previous Message Tatsuro Yamada 2018-12-28 08:17:06 Re: [HACKERS] CLUSTER command progress monitor