Quick Links

Re: [PATCH] Fix overflow and underflow in regr_r2()

From:	Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Chengpeng Yan <chengpeng_yan(at)outlook(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [PATCH] Fix overflow and underflow in regr_r2()
Date:	2026-05-16 18:03:45
Message-ID:	CAEZATCXpUwijoE2imbwWWraG63SmzgXyE+B8eoFNennu87-=kw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Sat, 16 May 2026 at 17:45, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> BTW, on the principle of "where else did we make the same mistake",
> I looked through the other aggregates using float8_regr_accum.
> Most seem okay, but float8_regr_intercept does this:
>
> PG_RETURN_FLOAT8((Sy - Sx * Sxy / Sxx) / N);
>
> Seems to me that expression is also prone to internal
> overflow/underflow. Underflow probably isn't a huge issue,
> since the result will reduce to Sy/N which is likely to be good
> enough. But can we do anything about overflow?
>
> One simple change that might make things better is to compute
>
> PG_RETURN_FLOAT8((Sy - Sx * (Sxy / Sxx)) / N);
>
> on the theory that the sums of products are likely to both be large.

Hmm, that isn't necessarily better. For example, with this data:

WITH t(x,y) AS (
SELECT 1e-155 + g*1e-160, 1e155 + g*1e150
FROM generate_series(1,10) g
)
SELECT sum(x::float8) sx, sum(y::float8) sy,
regr_sxx(y,x), regr_syy(y,x), regr_sxy(y,x),
regr_intercept(y,x)
FROM t;

sx | sy | regr_sxx |
regr_syy | regr_sxy | regr_intercept
-------------------------+---------------+--------------+------------------------+-----------------------+-------------------------
1.0000550000000001e-154 | 1.000055e+156 | 8.24996e-319 |
8.249999999970085e+301 | 8.249999999965278e-09 |
-5.144448004587567e+149
(1 row)

The current regr_intercept() code works fine, but if you were to
attempt to calculate Sxy / Sxx first, it would overflow.

I think probably the least likely to overflow computation would be Sxy
* (Sx / Sxx), because Sxx is likely to be very large/small whenever Sx
is, so Sx / Sxx seems unlikely to overflow. There may well be examples
disproving that theory too though, so maybe it needs to try multiple
orderings.

Regards,
Dean

In response to

Re: [PATCH] Fix overflow and underflow in regr_r2() at 2026-05-16 16:45:16 from Tom Lane

Responses

Re: [PATCH] Fix overflow and underflow in regr_r2() at 2026-05-16 18:42:59 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2026-05-16 18:42:59	Re: [PATCH] Fix overflow and underflow in regr_r2()
Previous Message	Isaac Morland	2026-05-16 17:54:36	Re: Order of tables dumped by pg_dump