| From: | Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> |
|---|---|
| To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
| Cc: | Oleg Ivanov <o15611(at)gmail(dot)com>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, pgsql-bugs(at)lists(dot)postgresql(dot)org |
| Subject: | Re: BUG #19340: Wrong result from CORR() function |
| Date: | 2025-12-05 09:23:59 |
| Message-ID: | CAEZATCU8=jjP1+dSSos6oXVzToGu-Rofyhv4BBpU31LN9Bo17Q@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-bugs |
On Wed, 3 Dec 2025 at 22:52, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Attached is a fleshed-out patch proposal that fixes the related
> aggregates and adds test cases.
>
Looking at float8_regr_accum(), I think it would be preferable to
arrange for it to leave Sxx, Syy, and Sxy zero until distinct X and Y
values are seen. I.e., something like this:
if (newvalX != commonX || isnan(newvalX))
commonX = get_float8_nan();
if (newvalY != commonY || isnan(newvalY))
commonY = get_float8_nan();
if (isnan(commonX) || isnan(commonY))
{
tmpX = newvalX * N - Sx;
tmpY = newvalY * N - Sy;
scale = 1.0 / (N * transvalues[0]);
if (isnan(commonX))
Sxx += tmpX * tmpX * scale;
if (isnan(commonY))
Syy += tmpY * tmpY * scale;
if (isnan(commonX) && isnan(commonY))
Sxy += tmpX * tmpY * scale;
... Overflow check ...
}
This would mean that float8_corr(), float8_regr_r2(),
float8_regr_slope(), and float8_regr_intercept() would not need to
look at commonX or commonY, and could simply rely on Sxx == 0 or Syy
== 0 to detect horizontal and vertical lines.
Aside from making the code simpler, this would guarantee that the
aggregate functions regr_sxx() and regr_syy() would return exactly
zero for all-constant X and Y inputs respectively, and that
regr_sxy(), covar_pop(), and covar_samp() would return exactly zero if
either the X or the Y inputs were all constant.
Something else that occurred to me was that float8_regr_avgx() and
float8_regr_avgy() might as well make use of commonX and commonY,
since we're calculating them, so they would return exact averages if
all the X or Y values were the same, rather than results with possible
rounding errors.
I also wonder if it would be worth doing something similar for the
single-variable aggregates so that var_pop(), var_samp(),
stddev_pop(), and stddev_samp() would all return exactly zero, and
avg() would return the exact common value, if all the inputs were
constant.
Regards,
Dean
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Dean Rasheed | 2025-12-05 10:17:15 | Re: BUG #19340: Wrong result from CORR() function |
| Previous Message | PG Bug reporting form | 2025-12-05 02:41:01 | BUG #19345: MemoryContextSizeFailure after upgrade 14.11 to 17.7 in stored procedure |