Re: random() (was Re: New GUC to sample log queries)

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Adrien Nayrat <adrien(dot)nayrat(at)anayrat(dot)info>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, vik(dot)fearing(at)2ndquadrant(dot)com, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: random() (was Re: New GUC to sample log queries)
Date: 2018-12-28 17:29:52
Message-ID: 1551.1546018192@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr> writes:
>> but I don't feel a need for replacing the algorithm.

> Hmmm. Does it mean that you would veto any change, even if the speed
> concern is addressed (i.e. faster/not slower with better quality)?

Well, not veto exactly, but I'd be suspicious of it.

First, erand48 has been around a *long* time and its properties are pretty
well understood; these other algorithms you found on the net have no real
pedigree IMO. Moreover, since it is standard, there's a lower cognitive
burden on people to understand what it is and what it can be trusted for.

Second, we don't actually have a problem we need to fix by changing the
algorithm. We do need to worry about keeping drandom's state separate
from the internal random() usages, but that's independent of what the
algorithm is. Nobody has complained that random() is insufficiently
random, only that the seed might be predictable.

I do agree, after closer inspection, that our current coding of _dorand48
is a hangover from machines lacking 64-bit arithmetic. glibc does it like
this:

int
__drand48_iterate (unsigned short int xsubi[3], struct drand48_data *buffer)
{
uint64_t X;
uint64_t result;

/* Initialize buffer, if not yet done. */
if (__glibc_unlikely (!buffer->__init))
{
buffer->__a = 0x5deece66dull;
buffer->__c = 0xb;
buffer->__init = 1;
}

/* Do the real work. We choose a data type which contains at least
48 bits. Because we compute the modulus it does not care how
many bits really are computed. */

X = (uint64_t) xsubi[2] << 32 | (uint32_t) xsubi[1] << 16 | xsubi[0];

result = X * buffer->__a + buffer->__c;

xsubi[0] = result & 0xffff;
xsubi[1] = (result >> 16) & 0xffff;
xsubi[2] = (result >> 32) & 0xffff;

return 0;
}

and other than the pointless use of variable __a and __c, I think
we ought to adopt similar coding.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2018-12-28 17:34:44 Re: removal of dangling temp tables
Previous Message Tom Lane 2018-12-28 16:47:25 Re: removal of dangling temp tables