Re: random() generates collisions too early

From: Joe Van Dyk <joe(at)tanga(dot)com>
To: Honza Horak <hhorak(at)redhat(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: random() generates collisions too early
Date: 2013-10-21 19:40:53
Message-ID: CACfv+pJ--Ky6TAmtNLke7ZqSZ8rE2t6ugTbH0d9=M=7io3Wwrg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Oddly enough, I'm debugging a problem with a function that uses random()
and that is now generating collisions after I upgraded to 9.3.

CREATE OR REPLACE FUNCTION public.random_characters(length integer)
RETURNS text
LANGUAGE sql
AS $function$
SELECT array_to_string(array((
SELECT SUBSTRING('abcdefghjkmnpqrstuvwxyz23456789'
FROM mod((random()*31)::int, 31)+1 FOR 1)
FROM generate_series(1, $1))),'');
$function$;

It's being used in a column definition like:
citext unique not null default upper(random_characters(8))

I can't make a self-contained reproducible test case now, I've tried but no
luck yet.

In the actual app, I'm getting collisions after only a couple hundred rows
are inserted. I never had a problem with < 9.3. The rows are being inserted
in a trigger, if that matters.

Joe

On Fri, Oct 18, 2013 at 4:55 AM, Honza Horak <hhorak(at)redhat(dot)com> wrote:

> Hi guys,
>
> after playing a bit with "select random();", it turned out that numbers
> get repeated quite early in a sequence. Originally I set lower PID range
> (echo "2048" >/proc/sys/kernel/pid_max), but it doesn't seem to affect the
> results.
>
> So, what I observed... First, I generated a set including 1000 randomly
> generated numbers without setting a seed.
>
> touch numbers
> for i in {1..1000} ; do
> echo "select random();"|psql|head -n 3|tail -n 1 >>numbers
> done
>
> Then, I continued in generating random numbers and tried to find the new
> one in the set:
>
> for i in {1..10000} ; do
> if grep `echo "select random();"|psql|head -n 3|tail -n 1` numbers ;
> then
> echo "SUCCESS: $i" ; break
> fi
> done
>
> To my surprise I'm able to find a collision very quickly, in first 1000
> numbers usually.
>
> Originally, I used psql calls to get random() value from different
> processes on purpose, but it seems Noah got similar results when random()
> is called in one process:
>
> On 10/18/2013 02:10 AM, Noah Misch wrote:
> > sudo sysctl -w kernel.pid_max=2048
> > psql -c 'create unlogged table samp(c float8)'
> > for n in `seq 1 200000`; do psql -qc 'insert into samp values
> (random())'; done
> >
> > The results covered only 181383 distinct values, and 68 values repeated
> four
> > or five times each. We should at least consider using a higher-entropy
> seed.
>
> As I was told this is not taken as a security issue, since random() is not
> considered as a CSPRNG in any case, but as Noah said, we should probably
> try to make it a bit better.
>
> Also, I'd suggest to state explicitly in the doc, that random() shouldn't
> be taken as CSPRNG, since I can imagine people blindly believing that
> random() can be good enough for such use cases, just because they see how
> many possible values they get from double-precision type:
> http://www.postgresql.org/**docs/9.3/static/functions-**math.html<http://www.postgresql.org/docs/9.3/static/functions-math.html>
>
> Regards,
> Honza
>
>
> --
> Sent via pgsql-bugs mailing list (pgsql-bugs(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/**mailpref/pgsql-bugs<http://www.postgresql.org/mailpref/pgsql-bugs>
>

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message lalbin 2013-10-21 21:17:16 BUG #8545: pg_dump fails to backup database level grants
Previous Message bentzi.mor 2013-10-21 16:48:43 BUG #8544: High nfs getattr