Re: BUG #19340: Wrong result from CORR() function

From: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Oleg Ivanov <o15611(at)gmail(dot)com>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #19340: Wrong result from CORR() function
Date: 2025-12-02 18:54:20
Message-ID: CAEZATCU8rnyKN3z_5-osk3Bn8dtzWf9nKjTr2E16-ExXiESNrQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Tue, 2 Dec 2025 at 17:22, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> It's fundamentally impossible to guarantee exact results with
> floating-point arithmetic, so if you are expecting that you need
> to readjust your expectations. But having said that, it does
> seem a bit sad that we can't detect constant-input cases exactly.

Yes, indeed. I tested the following query:

SELECT n,
(SELECT variance(1.3::float8) FROM generate_series(1, n)),
(SELECT corr(1.3, 1.3) FROM generate_series(1, n))
FROM generate_series(1, 10) g(n);

In v11 (with the old algorithm) this produces

n | variance | corr
----+----------------------+------
1 | |
2 | 0 |
3 | 0 |
4 | 0 |
5 | 3.5527136788005e-16 | 1
6 | 2.368475785867e-16 | 1
7 | 3.38353683695286e-16 | 1
8 | 0 |
9 | 0 |
10 | 0 |
(10 rows)

whereas in HEAD (with the Youngs-Cramer algorithm) it produces

n | variance | corr
----+------------------------+------
1 | |
2 | 0 |
3 | 0 |
4 | 0 |
5 | 0 |
6 | 5.259072701473412e-33 | 1
7 | 4.382560584561177e-33 | 1
8 | 3.756480501052437e-33 | 1
9 | 3.2869204384208825e-33 | 1
10 | 6.817316464872942e-33 | 1
(10 rows)

so the errors in the variance are smaller, but any non-zero error
makes the correlation completely wrong.

> I wonder whether it'd be worth carrying additional state to
> check that explicitly (instead of assuming that "if (Sxx == 0 ||
> Syy == 0)" will catch it).

I wondered the same thing. It's not nice to have to do that, but
clearly the existing test for constant inputs is no good. The question
is, do we really want to spend extra cycles on every query just to
catch this odd corner case?

Regards,
Dean

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2025-12-02 20:26:43 Re: BUG #19340: Wrong result from CORR() function
Previous Message Laurenz Albe 2025-12-02 17:51:14 Re: BUG #19341: REPLACE() fails to match final character when using nondeterministic ICU collation