Quick Links

Re: random() (was Re: New GUC to sample log queries)

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Adrien Nayrat <adrien(dot)nayrat(at)anayrat(dot)info>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, vik(dot)fearing(at)2ndquadrant(dot)com, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: random() (was Re: New GUC to sample log queries)
Date:	2018-12-28 17:29:52
Message-ID:	1551.1546018192@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr> writes:
>> but I don't feel a need for replacing the algorithm.

> Hmmm. Does it mean that you would veto any change, even if the speed
> concern is addressed (i.e. faster/not slower with better quality)?

Well, not veto exactly, but I'd be suspicious of it.

First, erand48 has been around a *long* time and its properties are pretty
well understood; these other algorithms you found on the net have no real
pedigree IMO. Moreover, since it is standard, there's a lower cognitive
burden on people to understand what it is and what it can be trusted for.

Second, we don't actually have a problem we need to fix by changing the
algorithm. We do need to worry about keeping drandom's state separate
from the internal random() usages, but that's independent of what the
algorithm is. Nobody has complained that random() is insufficiently
random, only that the seed might be predictable.

I do agree, after closer inspection, that our current coding of _dorand48
is a hangover from machines lacking 64-bit arithmetic. glibc does it like
this:

int
__drand48_iterate (unsigned short int xsubi[3], struct drand48_data *buffer)
{
uint64_t X;
uint64_t result;

/* Initialize buffer, if not yet done. */
if (__glibc_unlikely (!buffer->__init))
{
buffer->__a = 0x5deece66dull;
buffer->__c = 0xb;
buffer->__init = 1;
}

/* Do the real work. We choose a data type which contains at least
48 bits. Because we compute the modulus it does not care how
many bits really are computed. */

X = (uint64_t) xsubi[2] << 32 | (uint32_t) xsubi[1] << 16 | xsubi[0];

result = X * buffer->__a + buffer->__c;

xsubi[0] = result & 0xffff;
xsubi[1] = (result >> 16) & 0xffff;
xsubi[2] = (result >> 32) & 0xffff;

return 0;
}

and other than the pointless use of variable __a and __c, I think
we ought to adopt similar coding.

regards, tom lane

In response to

Re: random() (was Re: New GUC to sample log queries) at 2018-12-28 08:18:37 from Fabien COELHO

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Alvaro Herrera	2018-12-28 17:34:44	Re: removal of dangling temp tables
Previous Message	Tom Lane	2018-12-28 16:47:25	Re: removal of dangling temp tables