Definitional issue: stddev_pop (and related) for 1 input

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Definitional issue: stddev_pop (and related) for 1 input
Date: 2020-06-11 18:06:06
Message-ID: 353062.1591898766@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Before v12, stddev_pop() had the following behavior with just a
single input value:

regression=# SELECT stddev_pop('42'::float8);
stddev_pop
------------
0
(1 row)

regression=# SELECT stddev_pop('inf'::float8);
stddev_pop
------------
NaN
(1 row)

regression=# SELECT stddev_pop('nan'::float8);
stddev_pop
------------
NaN
(1 row)

As of v12, though, all three cases produce 0. I am not sure what
to think about that with respect to an infinity input, but I'm
quite sure I don't like it for NaN input.

It looks like the culprit is the introduction of the "Youngs-Cramer"
algorithm in float8_accum: nothing is done to Sxx at the first iteration,
even if the input is inf or NaN. I'd be inclined to force Sxx to NaN
when the first input is NaN, and perhaps also when it's Inf.
Alternatively we could clean up in the finalization routine by noting
that Sx is Inf/NaN, but that seems messier. Thoughts?

(I came across this by noting that the results don't agree with
numeric accumulation, which isn't using Youngs-Cramer.)

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2020-06-11 18:14:02 Re: hashagg slowdown due to spill changes
Previous Message Andres Freund 2020-06-11 17:45:25 Re: hashagg slowdown due to spill changes