Quick Links

Re: BUG #19340: Wrong result from CORR() function

From:	Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Oleg Ivanov <o15611(at)gmail(dot)com>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject:	Re: BUG #19340: Wrong result from CORR() function
Date:	2025-12-05 09:23:59
Message-ID:	CAEZATCU8=jjP1+dSSos6oXVzToGu-Rofyhv4BBpU31LN9Bo17Q@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-bugs

On Wed, 3 Dec 2025 at 22:52, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Attached is a fleshed-out patch proposal that fixes the related
> aggregates and adds test cases.
>

Looking at float8_regr_accum(), I think it would be preferable to
arrange for it to leave Sxx, Syy, and Sxy zero until distinct X and Y
values are seen. I.e., something like this:

if (newvalX != commonX || isnan(newvalX))
commonX = get_float8_nan();
if (newvalY != commonY || isnan(newvalY))
commonY = get_float8_nan();

if (isnan(commonX) || isnan(commonY))
{
tmpX = newvalX * N - Sx;
tmpY = newvalY * N - Sy;
scale = 1.0 / (N * transvalues[0]);
if (isnan(commonX))
Sxx += tmpX * tmpX * scale;
if (isnan(commonY))
Syy += tmpY * tmpY * scale;
if (isnan(commonX) && isnan(commonY))
Sxy += tmpX * tmpY * scale;

... Overflow check ...
}

This would mean that float8_corr(), float8_regr_r2(),
float8_regr_slope(), and float8_regr_intercept() would not need to
look at commonX or commonY, and could simply rely on Sxx == 0 or Syy
== 0 to detect horizontal and vertical lines.

Aside from making the code simpler, this would guarantee that the
aggregate functions regr_sxx() and regr_syy() would return exactly
zero for all-constant X and Y inputs respectively, and that
regr_sxy(), covar_pop(), and covar_samp() would return exactly zero if
either the X or the Y inputs were all constant.

Something else that occurred to me was that float8_regr_avgx() and
float8_regr_avgy() might as well make use of commonX and commonY,
since we're calculating them, so they would return exact averages if
all the X or Y values were the same, rather than results with possible
rounding errors.

I also wonder if it would be worth doing something similar for the
single-variable aggregates so that var_pop(), var_samp(),
stddev_pop(), and stddev_samp() would all return exactly zero, and
avg() would return the exact common value, if all the inputs were
constant.

Regards,
Dean

In response to

Re: BUG #19340: Wrong result from CORR() function at 2025-12-03 22:52:31 from Tom Lane

Responses

Re: BUG #19340: Wrong result from CORR() function at 2025-12-05 22:17:33 from Tom Lane

Browse pgsql-bugs by date

	From	Date	Subject
Next Message	Dean Rasheed	2025-12-05 10:17:15	Re: BUG #19340: Wrong result from CORR() function
Previous Message	PG Bug reporting form	2025-12-05 02:41:01	BUG #19345: MemoryContextSizeFailure after upgrade 14.11 to 17.7 in stored procedure