Re: BUG #19340: Wrong result from CORR() function

From: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Oleg Ivanov <o15611(at)gmail(dot)com>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #19340: Wrong result from CORR() function
Date: 2025-12-05 10:17:15
Message-ID: CAEZATCUyPKDb5z7yasp5v5HN9dTzPKqqa4u20goJK2fC15LpcQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Wed, 3 Dec 2025 at 22:52, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Poking at this, I soon found a test case where even with the separate
> sqrt() calls we'd produce a result slightly outside [-1, 1] (running
> this test over more values of x is sufficient). So now I think we
> should do both the separate sqrt and the clamp.
>

I'm starting to have doubts about having 2 sqrt() calls. The problem
is that it seems to produce a noticeable reduction in accuracy in
quite a few cases. This is especially noticeable with fully-correlated
data. For example:

SELECT n, (SELECT corr(x, x) FROM generate_series(1, n) x)
FROM generate_series(1, 10) g(n);

n | corr
----+--------------------
1 |
2 | 0.9999999999999998
3 | 0.9999999999999998
4 | 0.9999999999999998
5 | 0.9999999999999998
6 | 1
7 | 0.9999999999999999
8 | 1
9 | 0.9999999999999999
10 | 1
(10 rows)

Now I'm not sure that the current code can be expected to get cases
like this exactly right 100% of the time, but it's pretty close. For
example, if I do this:

WITH t1 AS (
SELECT n, random() * 1000 AS r FROM generate_series(1, 1000000) n
), t2 AS (
SELECT corr(r, r) FROM t1 GROUP BY n % 10000
)
SELECT count(*), count(*) FILTER (WHERE corr != 1) FROM t2;

on HEAD it produced corr = 1 every time I ran it, whereas the patch
gives rounding errors roughly 25% of the time, which seems likely to
be noticed.

Perhaps we should only use 2 sqrt()'s if the product Sxx * Syy overflows.

Regards,
Dean

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Anthonin Bonnefoy 2025-12-05 13:33:26 Re: Segfault due to NULL ParamExecData value
Previous Message Dean Rasheed 2025-12-05 09:23:59 Re: BUG #19340: Wrong result from CORR() function