Re: surprising results with random()

From: raf <raf(at)raf(dot)org>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: surprising results with random()
Date: 2009-02-24 00:16:18
Message-ID: 20090224001618.GA2676@raf.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Steve Atkins wrote:

>
> On Feb 23, 2009, at 2:09 PM, Jessi Berkelhammer wrote:
>
> >Hi,
> >
> >I have a view in which I want to randomly assign values if certain
> >conditions hold. I was getting surprising results. Here is a (very)
> >simplified version of the view, which seems to indicate the problem:
> >
> >CREATE OR REPLACE VIEW test_view AS
> >SELECT
> > CASE
> > WHEN random() < .3333 THEN '1'
> > WHEN random() < .3333 THEN '2'
> > ELSE '3'
> > END AS test_value
> >
> >FROM client ;
> >
> >It seems this should generate a random number between 0 and 1, and set
> >test_value to '1' if this first generated number is less than .3333.
> >Otherwise, it should generate another random number, and set
> >test_value
> >to '2' if this is less than .3333. And if neither of the random
> >numbers
> >are less than .3333, it should set test_value to '3'. It seems to me
> >that there should be a relative even distribution of the 3 values.
> >
> >
> >However when I run this, the values are always similar to what is
> >below:
> >
> >X_test=> select test_value, count(*) from test_view group by 1
> >order by 1;
> >test_value | count
> >------------+-------
> >1 | 23947
> >2 | 16061
> >3 | 32443
> >
> >Why are there significantly fewer 2s? I understand that random() is
> >not
> >truly random, and that the seed affects this value. But it still
> >confuses me that, no matter how many times I run this, there are
> >always
> >so few 2s. If it is generating an independent random number in the
> >second call to random(), then I don't know why there are more so many
> >more 1s than 2s.
>
> Nope, it's nothing to do with random(), it's that your maths is wrong.
>
> There are 9 possible cases. In 3 of them you return 1. In 2 of them you
> return 2. In the remaining 4 cases you return 3.
>
> If you were to run this 72451 times I'd expect to see
> 1: 24150 = 72451 * 3/9
> 2: 16100 = 72451 * 2/9
> 3: 32200 = 72451 * 4/9
>
> Which, unsurprisingly, is fairly close to what you get.
>
> Cheers,
> Steve

this looks like an attempt to understand the monty hall problem.
http://en.wikipedia.org/wiki/Monty_Hall_problem
except that there's no goat :(

cheers,
raf

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Jordan Tomkinson 2009-02-24 00:17:54 Re: High cpu usage after many inserts
Previous Message Tom Lane 2009-02-24 00:16:02 Re: [Fwd: Re: surprising results with random()]