Re: [PATCH] Fix overflow and underflow in regr_r2()

From: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Chengpeng Yan <chengpeng_yan(at)outlook(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Fix overflow and underflow in regr_r2()
Date: 2026-05-16 18:03:45
Message-ID: CAEZATCXpUwijoE2imbwWWraG63SmzgXyE+B8eoFNennu87-=kw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, 16 May 2026 at 17:45, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> BTW, on the principle of "where else did we make the same mistake",
> I looked through the other aggregates using float8_regr_accum.
> Most seem okay, but float8_regr_intercept does this:
>
> PG_RETURN_FLOAT8((Sy - Sx * Sxy / Sxx) / N);
>
> Seems to me that expression is also prone to internal
> overflow/underflow. Underflow probably isn't a huge issue,
> since the result will reduce to Sy/N which is likely to be good
> enough. But can we do anything about overflow?
>
> One simple change that might make things better is to compute
>
> PG_RETURN_FLOAT8((Sy - Sx * (Sxy / Sxx)) / N);
>
> on the theory that the sums of products are likely to both be large.

Hmm, that isn't necessarily better. For example, with this data:

WITH t(x,y) AS (
SELECT 1e-155 + g*1e-160, 1e155 + g*1e150
FROM generate_series(1,10) g
)
SELECT sum(x::float8) sx, sum(y::float8) sy,
regr_sxx(y,x), regr_syy(y,x), regr_sxy(y,x),
regr_intercept(y,x)
FROM t;

sx | sy | regr_sxx |
regr_syy | regr_sxy | regr_intercept
-------------------------+---------------+--------------+------------------------+-----------------------+-------------------------
1.0000550000000001e-154 | 1.000055e+156 | 8.24996e-319 |
8.249999999970085e+301 | 8.249999999965278e-09 |
-5.144448004587567e+149
(1 row)

The current regr_intercept() code works fine, but if you were to
attempt to calculate Sxy / Sxx first, it would overflow.

I think probably the least likely to overflow computation would be Sxy
* (Sx / Sxx), because Sxx is likely to be very large/small whenever Sx
is, so Sx / Sxx seems unlikely to overflow. There may well be examples
disproving that theory too though, so maybe it needs to try multiple
orderings.

Regards,
Dean

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2026-05-16 18:42:59 Re: [PATCH] Fix overflow and underflow in regr_r2()
Previous Message Isaac Morland 2026-05-16 17:54:36 Re: Order of tables dumped by pg_dump