Re: [PATCH] random_normal function

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc: Paul Ramsey <pramsey(at)cleverelephant(dot)ca>, Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>, Michael Paquier <michael(at)paquier(dot)xyz>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] random_normal function
Date: 2023-01-09 23:38:39
Message-ID: 67201.1673307519@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> writes:
> On Mon, 9 Jan 2023 at 18:52, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> (We could probably go further
>> than this, like trying to verify distribution properties. But
>> it's been too long since college statistics for me to want to
>> write such tests myself, and I'm not real sure we need them.)

> I played around with the Kolmogorov–Smirnov test, to test random() for
> uniformity. The following passes roughly 99.9% of the time:

Ah, cool, thanks for this code.

> With a one-in-a-thousand chance of failing, if you wanted something
> with around a one-in-a-billion chance of failing, you could just try
> it 3 times:
> SELECT ks_test_uniform_random() OR
> ks_test_uniform_random() OR
> ks_test_uniform_random();
> but it feels pretty hacky, and probably not really necessary.

That seems like a good way, because I'm not satisfied with
one-in-a-thousand odds if we want to remove the "ignore" marker.
It's still plenty fast enough: on my machine, the v2 patch below
takes about 19ms, versus 22ms for the script as it stands in HEAD.

> Rigorous tests for other distributions are harder, but also probably
> not necessary if we have confidence that the underlying PRNG is
> uniform.

Agreed.

>> BTW, if this does bring the probability of failure down to the
>> one-in-a-billion range, I think we could also nuke the whole
>> "ignore:" business, simplifying pg_regress and allowing the
>> random test to be run in parallel with others.

> I didn't check the one-in-a-billion claim, but +1 for that.

I realized that we do already run random in a parallel group;
the "ignore: random" line in parallel_schedule just marks it
as failure-ignorable, it doesn't schedule it. (The comment is a
bit misleading about this, but I want to remove that not rewrite it.)
Nonetheless, nuking the whole ignore-failures mechanism seems like
good cleanup to me.

Also, I tried this on some 32-bit big-endian hardware (NetBSD on macppc)
to verify my thesis that the results of random() are now machine
independent. That part works, but the random_normal() tests didn't;
I saw low-order-bit differences from the results on x86_64 Linux.
Presumably, that's because one or more of sqrt()/log()/sin() are
rounding off a bit differently there. v2 attached deals with this by
backing off to "extra_float_digits = 0" for that test. Once it hits the
buildfarm we might find we have to reduce extra_float_digits some more,
but that was enough to make NetBSD/macppc happy.

regards, tom lane

Attachment Content-Type Size
improve-random-tests-2.patch text/x-diff 11.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2023-01-09 23:38:58 Re: POC: Lock updated tuples in tuple_update() and tuple_delete()
Previous Message Jelte Fennema 2023-01-09 23:25:14 Re: Allow +group in pg_ident.conf