Quick Links

Re: Functions to return random numbers in a given range

From:	Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To:	Aleksander Alekseev <aleksander(at)timescale(dot)com>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, David Zhang <david(dot)zhang(at)highgo(dot)ca>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Subject:	Re: Functions to return random numbers in a given range
Date:	2024-01-30 13:44:25
Message-ID:	CAEZATCW5q+RAxoK+zvyKQa2yqeDam4qK=ChJW0ngBx+qn46vrQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Tue, 30 Jan 2024 at 12:47, Aleksander Alekseev
<aleksander(at)timescale(dot)com> wrote:
>
> Maybe I'm missing something but I'm not sure if I understand what this
> test tests particularly:
>
> ```
> -- There should be no triple duplicates in 1000 full-range 32-bit random()
> -- values. (Each of the C(1000, 3) choices of triplets from the 1000 values
> -- has a probability of 1/(2^32)^2 of being a triple duplicate, so the
> -- average number of triple duplicates is 1000 * 999 * 998 / 6 / 2^64, which
> -- is roughly 9e-12.)
> SELECT r, count(*)
> FROM (SELECT random(-2147483648, 2147483647) r
> FROM generate_series(1, 1000)) ss
> GROUP BY r HAVING count(*) > 2;
> ```
>
> The intent seems to be to check the fact that random numbers are
> distributed evenly. If this is the case I think the test is wrong. The
> sequence of numbers 100, 100, 100, 100, 100 is as random as 99, 8, 4,
> 12, 45 and every particular sequence has low probability. All in all
> personally I would argue that this is a meaningless test that just
> fails with a low probability. Same for the tests that follow below.
>

I'm following the same approach used to test the existing random
functions, and the idea is the same. For example, this existing test:

-- There should be no duplicates in 1000 random() values.
-- (Assuming 52 random bits in the float8 results, we could
-- take as many as 3000 values and still have less than 1e-9 chance
-- of failure, per https://en.wikipedia.org/wiki/Birthday_problem)
SELECT r, count(*)
FROM (SELECT random() r FROM generate_series(1, 1000)) ss
GROUP BY r HAVING count(*) > 1;

If the underlying PRNG were non-uniform, or the method of reduction to
the required range was flawed in some way that reduced the number of
actual possible return values, then the probability of duplicates
would be increased. A non-uniform distribution would probably be
caught by the KS tests, but uniform gaps in the possible outputs might
not be, so I think this test still has value.

> The proper way of testing PRNG would be to call setseed() and compare
> return values with expected ones. I don't mind testing the proposed
> invariants but they should do this after calling setseed(). Currently
> the patch places the tests right before the call.
>

There are also new tests of that nature, following the call to
setseed(0.5). They're useful for a quick visual check of the results,
and confirming the expected number of digits after the decimal point
in the numeric case. However, I think those tests are insufficient on
their own.

Regards,
Dean

In response to

Re: Functions to return random numbers in a given range at 2024-01-30 12:47:36 from Aleksander Alekseev

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	jian he	2024-01-30 14:15:20	Re: POC, WIP: OR-clause support for indexes
Previous Message	Nazir Bilal Yavuz	2024-01-30 13:43:31	Re: compiling postgres on windows arm using meson