Re: [PATCH] random_normal function

From: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Paul Ramsey <pramsey(at)cleverelephant(dot)ca>, Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>, Michael Paquier <michael(at)paquier(dot)xyz>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] random_normal function
Date: 2023-01-09 12:03:34
Message-ID: CAEZATCXDi2VkMAJSJ5=VNopTVDFfThoyQmNBaPLQQKJ+t7xSzQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 9 Jan 2023 at 00:20, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> This version seems committable to me, barring objections.
>

Whilst I have no objection to adding random_normal(), ISTM that we're
at risk of adding an arbitrary set of random functions without a clear
idea of where we'll end up, and how they'll affect one another (shared
state or not).

For example, random_normal() uses a PRNG and it shares the same state
as random()/setseed(). That's fine, and presumably if we add other
continuous distributions like exponential, they'll follow suit.

But what if we add something like random_int() to return a random
integer in a range (something that has been suggested before, and I
think would be very useful), or other discrete random functions? Would
they use a PRNG and share the same state? If so, having that state in
float.c wouldn't make sense anymore. If not, will they have their own
seed-setting functions, and how many seed-setting functions will we
end up with?

Over on [1] we're currently heading towards adding array_shuffle() and
array_sample() using a PRNG with a separate state shared just by those
2 functions, with no way to seed it, and not marking them as PARALLEL
RESTRICTED, so they can't be made deterministic.

ISTM that random functions should fall into 1 of 2 clearly defined
categories -- strong random functions and pseudorandom functions. IMO
all pseudorandom functions should be PARALLEL RESTRICTED and have a
way to set their seed, so that they can be made deterministic --
something that's very useful for users writing tests.

Now maybe having multiple seed-setting functions (one per datatype?)
is OK, but it seems unnecessary and cumbersome to me. Why would you
ever want to seed the random_int() sequence and the random() sequence
differently? No other library of random functions I know of behaves
that way.

So IMO all pseudorandom functions should share the same PRNG state and
seed-setting functions. That would mean they should all be in the same
(new) C file, so that the PRNG state can be kept private to that file.

I think it would also make sense to add a new "Random Functions"
section to the docs, and move the descriptions of random(),
random_normal() and setseed() there. That way, all the functions
affected by setseed() can be kept together on one page (future random
functions may not all be sensibly classified as "mathematical").

Regards,
Dean

[1] https://www.postgresql.org/message-id/flat/9d160a44-7675-51e8-60cf-6d64b76db831(at)aboutsource(dot)net

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2023-01-09 12:23:14 Re: Todo: Teach planner to evaluate multiple windows in the optimal order
Previous Message shiy.fnst@fujitsu.com 2023-01-09 11:59:11 Fix pg_publication_tables to exclude generated columns