Re: [PATCH] random_normal function

From: Paul Ramsey <pramsey(at)cleverelephant(dot)ca>
To: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] random_normal function
Date: 2022-12-09 18:51:53
Message-ID: D1722BB1-78A1-48ED-BE61-EF49648BBD6B@cleverelephant.ca
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Dec 9, 2022, at 10:39 AM, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com> wrote:
>
>> On Dec 8, 2022, at 1:53 PM, Paul Ramsey <pramsey(at)cleverelephant(dot)ca> wrote:
>>
>> Just a utility function to generate random numbers from a normal
>> distribution. I find myself doing this several times a year, and I am
>> sure I must not be the only one.
>
> Thanks for the patch. What do you think about these results?

Angels on pins time! :)

> +-- The semantics of a negative stddev are not well defined
> +SELECT random_normal(mean := 0, stddev := -1);
> + random_normal
> +---------------------
> + -1.0285744583010896
> +(1 row)

Question is does a negative stddev make enough sense? It is functionally using fabs(stddev),

SELECT avg(random_normal(mean := 0, stddev := -1)) from generate_series(1,1000);
avg
---------------------
0.03156106778729526

So could toss an invalid parameter on negative? Not sure if that's more helpful than just being mellow about this input.

> +
> +SELECT random_normal(mean := 0, stddev := '-Inf');
> + random_normal
> +---------------
> + Infinity
> +(1 row)

The existing logic around means and stddevs and Inf is hard to tease out:

SELECT avg(v),stddev(v) from (VALUES ('Inf'::float8, '-Inf'::float8)) a(v);
avg | stddev
----------+--------
Infinity |

The return of NULL of stddev would seem to argue that null in this case means "does not compute" at some level. So return NULL on Inf stddev?

> +
> +-- This result may be defensible...
> +SELECT random_normal(mean := '-Inf', stddev := 'Inf');
> + random_normal
> +---------------
> + -Infinity
> +(1 row)
> +
> +-- but if so, why is this NaN?
> +SELECT random_normal(mean := 'Inf', stddev := 'Inf');
> + random_normal
> +---------------
> + NaN
> +(1 row)

An Inf mean only implies that one value in the distribution is Inf, but running the function in reverse (generating values) and only generating one value from the distribution implies we have to always return Inf (except in this case stddev is also Inf, so I'd go with NULL, assuming we accept the NULL premise above.

How do you read the tea leaves?

P.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joe Conway 2022-12-09 19:01:20 Re: [PATCH] random_normal function
Previous Message Nathan Bossart 2022-12-09 18:44:11 Re: add \dpS to psql