From: | Maxim Orlov <orlovmg(at)gmail(dot)com> |
---|---|
To: | Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Aggregate Function corr does not always return the correct value |
Date: | 2025-08-26 15:20:53 |
Message-ID: | CACG=ezY6Xj0=FqxE3FThxYLZnL=_uNf10yt6+OoDa8+PVhuD9w@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi!
One of the clients complained as to why the query for calculating the
correlation coefficient with the CORR function yielded such weird
results. After a little analysis, it was discovered that they were
calculating the correlation coefficient for two sets, one of which is
more or less random and the other of which is simply a set of constant
values (0.09 if that matters). As a result, they were attaining
unexpected results. However, as far as I am aware, they should have
received NULL because it is impossible to calculate the standard
deviation for such a set.
It turns out that for some values, the function does not return NULL.
Here is a reproducing of the problem:
==============================
postgres=# WITH dataset AS (SELECT x, 0.125 AS y FROM generate_series(0, 5)
AS x) SELECT corr(x, y) FROM dataset;
corr
------
(1 row)
postgres=# WITH dataset AS (SELECT x, 0.1 AS y FROM generate_series(0, 5)
AS x) SELECT corr(x, y) FROM dataset;
corr
--------------------
0.6546536707079771
(1 row)
==============================
I'm not sure how to fix that yet, but the issue is that the Sxx and Syy
variables in the float8_corr function are very close to 0, but not zero,
so we can't return NULL.
--
Best regards,
Maxim Orlov.
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Eisentraut | 2025-08-26 15:52:36 | Re: C11 / VS 2019 |
Previous Message | Michael Banck | 2025-08-26 15:17:28 | Re: Dead code in ps_status.c |