Re: random() (was Re: New GUC to sample log queries)

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Peter Geoghegan <pg(at)bowt(dot)ie>, Michael Paquier <michael(at)paquier(dot)xyz>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Adrien Nayrat <adrien(dot)nayrat(at)anayrat(dot)info>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Vik Fearing <vik(dot)fearing(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: random() (was Re: New GUC to sample log queries)
Date: 2018-12-28 23:15:05
Message-ID: 10913.1546038905@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> writes:
>> +1, but I wonder if just separating them is enough. Is our seeding
>> algorithm good enough for this new purpose? The initial seed is 100%
>> predictable to a logged in user (it's made from the backend PID and
>> backend start time, which we tell you), and not even that hard to
>> guess from the outside, so I think Coverity's warning is an
>> understatement in this case. Even if we separate the PRNG state used
>> for internal stuff so that users can't clobber its seed from SQL,
>> wouldn't it be possible to predict which statements will survive the
>> log sampling filter given easily available information and a good
>> guess at how many times random() (or whatever similar thing) has been
>> called so far?

> Yeah, that's a good point. Maybe we should upgrade the per-process
> seed initialization to make it less predictable. I could see expending
> a call of the strong RNG to contribute some more noise to the seeds
> selected in InitProcessGlobals().

Here's a simple patch to do so.

Looking at this, I seem to remember that we considered doing exactly this
awhile ago, but refrained because there was concern about depleting the
system's reserve of entropy if we have a high backend spawn rate, and it
didn't seem like there was a security reason to insist on unpredictable
random() results. However, the log-sampling patch destroys the latter
argument. As for the former argument, I'm not sure how big a deal that
really is. Presumably, the act of spawning a backend would itself
contribute some more entropy to the pool (particularly if a network
connection is involved), so the depletion problem might be fictitious
in the first place. Also, a few references I consulted, such as the
Linux urandom(4) man page, suggest that even in a depleted-entropy
state the results of reading /dev/urandom should be random enough
for all but the very strictest security requirements.

Thoughts?

regards, tom lane

Attachment Content-Type Size
make-random-seed-more-random-1.patch text/x-diff 2.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2018-12-28 23:20:42 Re: Making all nbtree entries unique by having heap TIDs participate in comparisons
Previous Message Peter Geoghegan 2018-12-28 23:04:05 Re: Making all nbtree entries unique by having heap TIDs participate in comparisons