| From: | Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> |
|---|---|
| To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
| Cc: | Chengpeng Yan <chengpeng_yan(at)outlook(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: [PATCH] Fix overflow and underflow in regr_r2() |
| Date: | 2026-05-16 18:03:45 |
| Message-ID: | CAEZATCXpUwijoE2imbwWWraG63SmzgXyE+B8eoFNennu87-=kw@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Sat, 16 May 2026 at 17:45, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> BTW, on the principle of "where else did we make the same mistake",
> I looked through the other aggregates using float8_regr_accum.
> Most seem okay, but float8_regr_intercept does this:
>
> PG_RETURN_FLOAT8((Sy - Sx * Sxy / Sxx) / N);
>
> Seems to me that expression is also prone to internal
> overflow/underflow. Underflow probably isn't a huge issue,
> since the result will reduce to Sy/N which is likely to be good
> enough. But can we do anything about overflow?
>
> One simple change that might make things better is to compute
>
> PG_RETURN_FLOAT8((Sy - Sx * (Sxy / Sxx)) / N);
>
> on the theory that the sums of products are likely to both be large.
Hmm, that isn't necessarily better. For example, with this data:
WITH t(x,y) AS (
SELECT 1e-155 + g*1e-160, 1e155 + g*1e150
FROM generate_series(1,10) g
)
SELECT sum(x::float8) sx, sum(y::float8) sy,
regr_sxx(y,x), regr_syy(y,x), regr_sxy(y,x),
regr_intercept(y,x)
FROM t;
sx | sy | regr_sxx |
regr_syy | regr_sxy | regr_intercept
-------------------------+---------------+--------------+------------------------+-----------------------+-------------------------
1.0000550000000001e-154 | 1.000055e+156 | 8.24996e-319 |
8.249999999970085e+301 | 8.249999999965278e-09 |
-5.144448004587567e+149
(1 row)
The current regr_intercept() code works fine, but if you were to
attempt to calculate Sxy / Sxx first, it would overflow.
I think probably the least likely to overflow computation would be Sxy
* (Sx / Sxx), because Sxx is likely to be very large/small whenever Sx
is, so Sx / Sxx seems unlikely to overflow. There may well be examples
disproving that theory too though, so maybe it needs to try multiple
orderings.
Regards,
Dean
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tom Lane | 2026-05-16 18:42:59 | Re: [PATCH] Fix overflow and underflow in regr_r2() |
| Previous Message | Isaac Morland | 2026-05-16 17:54:36 | Re: Order of tables dumped by pg_dump |