| From: | Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> |
|---|---|
| To: | Chengpeng Yan <chengpeng_yan(at)outlook(dot)com> |
| Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
| Subject: | Re: [PATCH] Fix overflow and underflow in regr_r2() |
| Date: | 2026-05-16 09:39:30 |
| Message-ID: | CAEZATCUaV+qmBCh0zv0EPdBvhE2skHCzkRw78VNjBMCX9Z7h+w@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Mon, 27 Apr 2026 at 12:18, Chengpeng Yan <chengpeng_yan(at)outlook(dot)com> wrote:
>
> While looking at the corr() overflow/underflow discussion [1], I noticed
> that regr_r2() still computes
>
> (Sxy * Sxy) / (Sxx * Syy)
>
> directly. At very small or very large scales, those products can round
> to zero or infinity even when the ratio itself is finite.
Yes, good point.
> corr() already has a stabilized calculation for the same Sxx * Syy
> denominator scale. This patch factors that into a helper and lets
> regr_r2() use it as a fallback when one of its direct products has
> rounded to zero or infinity. Otherwise, regr_r2() keeps the existing
> direct formula.
The comments need work -- in particular float8_regr_r2() needs a
comment explaining the new overflow/underflow checks, similar to the
comment in float8_corr(). In fact, doing that, I think it's preferable
to just keep this change local to float8_regr_r2(), rather than
refactoring into a helper function for just a few lines of code.
This new check in float8_regr_r2():
+ if (Sxy == 0 && !isnan(Sxx) && !isnan(Syy))
+ PG_RETURN_FLOAT8(0.0);
seems pointless. It's optimising for a special case that will very
rarely occur in practice, and which is handled fine by the general
code. We don't want to slow down the common code path for such rare
special cases.
I noticed that this new overflow test case:
+SELECT regr_r2(1e154::float8 * g, 1e154::float8 * g)
+ FROM generate_series(1, 2) g;
+ regr_r2
+---------
+ 1
+(1 row)
only produces 1 because it's run with a reduced extra_float_digits
value. I think it's better to use the test values "1e100 + g * 1e95",
which still trigger the overflow on HEAD, but more reliably produce 1,
regardless of the extra_float_digits setting, making it less likely
that there will be variations between platforms. That's also more
consistent with the other nearby test cases.
Attached is a v2 patch with those changes, plus a little more tidying
up of the regression tests.
Arguably, this is a bug fix, but given the lack of prior complaints, I
think we should apply this to HEAD only, like 6498287696d, meaning it
will only appear in PG19.
Regards,
Dean
| Attachment | Content-Type | Size |
|---|---|---|
| v2-0001-Improve-overflow-underflow-handling-in-regr_r2.patch | text/x-patch | 7.2 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Zsolt Parragi | 2026-05-16 11:15:34 | Re: [PATCH] Add NESTED_STATEMENTS option to EXPLAIN |
| Previous Message | Etsuro Fujita | 2026-05-16 09:10:00 | Re: Import Statistics in postgres_fdw before resorting to sampling. |