|From:||Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>|
|To:||Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>|
|Cc:||Paul Ramsey <pramsey(at)cleverelephant(dot)ca>, Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>, Michael Paquier <michael(at)paquier(dot)xyz>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, pgsql-hackers(at)postgresql(dot)org|
|Subject:||Re: [PATCH] random_normal function|
|Views:||Raw Message | Whole Thread | Download mbox | Resend email|
Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> writes:
> On Mon, 9 Jan 2023 at 18:52, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> (We could probably go further
>> than this, like trying to verify distribution properties. But
>> it's been too long since college statistics for me to want to
>> write such tests myself, and I'm not real sure we need them.)
> I played around with the Kolmogorov–Smirnov test, to test random() for
> uniformity. The following passes roughly 99.9% of the time:
Ah, cool, thanks for this code.
> With a one-in-a-thousand chance of failing, if you wanted something
> with around a one-in-a-billion chance of failing, you could just try
> it 3 times:
> SELECT ks_test_uniform_random() OR
> ks_test_uniform_random() OR
> but it feels pretty hacky, and probably not really necessary.
That seems like a good way, because I'm not satisfied with
one-in-a-thousand odds if we want to remove the "ignore" marker.
It's still plenty fast enough: on my machine, the v2 patch below
takes about 19ms, versus 22ms for the script as it stands in HEAD.
> Rigorous tests for other distributions are harder, but also probably
> not necessary if we have confidence that the underlying PRNG is
>> BTW, if this does bring the probability of failure down to the
>> one-in-a-billion range, I think we could also nuke the whole
>> "ignore:" business, simplifying pg_regress and allowing the
>> random test to be run in parallel with others.
> I didn't check the one-in-a-billion claim, but +1 for that.
I realized that we do already run random in a parallel group;
the "ignore: random" line in parallel_schedule just marks it
as failure-ignorable, it doesn't schedule it. (The comment is a
bit misleading about this, but I want to remove that not rewrite it.)
Nonetheless, nuking the whole ignore-failures mechanism seems like
good cleanup to me.
Also, I tried this on some 32-bit big-endian hardware (NetBSD on macppc)
to verify my thesis that the results of random() are now machine
independent. That part works, but the random_normal() tests didn't;
I saw low-order-bit differences from the results on x86_64 Linux.
Presumably, that's because one or more of sqrt()/log()/sin() are
rounding off a bit differently there. v2 attached deals with this by
backing off to "extra_float_digits = 0" for that test. Once it hits the
buildfarm we might find we have to reduce extra_float_digits some more,
but that was enough to make NetBSD/macppc happy.
regards, tom lane
|Next Message||Andres Freund||2023-01-09 23:38:58||Re: POC: Lock updated tuples in tuple_update() and tuple_delete()|
|Previous Message||Jelte Fennema||2023-01-09 23:25:14||Re: Allow +group in pg_ident.conf|