| From: | Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> |
|---|---|
| To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
| Cc: | Oleg Ivanov <o15611(at)gmail(dot)com>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, pgsql-bugs(at)lists(dot)postgresql(dot)org |
| Subject: | Re: BUG #19340: Wrong result from CORR() function |
| Date: | 2025-12-05 10:17:15 |
| Message-ID: | CAEZATCUyPKDb5z7yasp5v5HN9dTzPKqqa4u20goJK2fC15LpcQ@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-bugs |
On Wed, 3 Dec 2025 at 22:52, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Poking at this, I soon found a test case where even with the separate
> sqrt() calls we'd produce a result slightly outside [-1, 1] (running
> this test over more values of x is sufficient). So now I think we
> should do both the separate sqrt and the clamp.
>
I'm starting to have doubts about having 2 sqrt() calls. The problem
is that it seems to produce a noticeable reduction in accuracy in
quite a few cases. This is especially noticeable with fully-correlated
data. For example:
SELECT n, (SELECT corr(x, x) FROM generate_series(1, n) x)
FROM generate_series(1, 10) g(n);
n | corr
----+--------------------
1 |
2 | 0.9999999999999998
3 | 0.9999999999999998
4 | 0.9999999999999998
5 | 0.9999999999999998
6 | 1
7 | 0.9999999999999999
8 | 1
9 | 0.9999999999999999
10 | 1
(10 rows)
Now I'm not sure that the current code can be expected to get cases
like this exactly right 100% of the time, but it's pretty close. For
example, if I do this:
WITH t1 AS (
SELECT n, random() * 1000 AS r FROM generate_series(1, 1000000) n
), t2 AS (
SELECT corr(r, r) FROM t1 GROUP BY n % 10000
)
SELECT count(*), count(*) FILTER (WHERE corr != 1) FROM t2;
on HEAD it produced corr = 1 every time I ran it, whereas the patch
gives rounding errors roughly 25% of the time, which seems likely to
be noticed.
Perhaps we should only use 2 sqrt()'s if the product Sxx * Syy overflows.
Regards,
Dean
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Anthonin Bonnefoy | 2025-12-05 13:33:26 | Re: Segfault due to NULL ParamExecData value |
| Previous Message | Dean Rasheed | 2025-12-05 09:23:59 | Re: BUG #19340: Wrong result from CORR() function |