Re: BUG #15307: Low numerical precision of (Co-) Variance

From: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To: Erich Schubert <erich(at)debian(dot)org>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #15307: Low numerical precision of (Co-) Variance
Date: 2018-08-28 19:05:13
Message-ID: CAEZATCXRXn40RJjwfSz0FoJfHAX9Y6gwYB2HdL30KVE8ozwCyA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On 9 August 2018 at 12:02, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> wrote:
> ... the YC algorithm is probably preferable. For the record,
> attached are both versions that I tried.
>

Here is an updated, more complete patch, based on the YC algorithm,
with updated regression tests for the September commitfest.

All the existing tests pass unchanged, although I'm somewhat surprised
that the current tests pass with no platform variations. I've added
new tests to cover infinity/NaN handling, parallel aggregation and
confirm the improved accuracy with large offsets. The latter tests
operate well within in the limits of double precision arithmetic, so I
wouldn't expect any platform variation, but that's difficult
guarantee. If there are problems, it may be necessary to round the
test results.

Notable changes from the previous patch:

I have rewritten the overflow checks in the accum functions to be
clearer and more efficient which, if anything, makes these aggregates
now slightly faster than HEAD. More importantly though, I've added
explicit code to force Sxx to be NaN if any input is infinite, which
the previous coding didn't guarantee. I think NaN is the right result
for quantities like variance, if any input value is infinite, since it
logically involves 'infinity minus infinity'. That's also consistent
with the current behaviour.

I have also made the aggregate combine functions SQL-callable to make
testing easier -- there was a bug in the previous version due to a
typo which meant that float8_regr_combine() was incorrect when N1 was
non-zero and N2 was zero. That situation is unlikely to happen in
practice, and difficult to provoke deliberately without manually
calling the combine function, which is why I didn't spot it before.
The new tests cover all code branches, and make it easier to see that
the combine functions are producing the correct results.

Regards,
Dean

Attachment Content-Type Size
implement-float-aggs-using-youngs-cramer-v2.patch application/octet-stream 36.5 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Michael Paquier 2018-08-29 01:34:41 Re: BUG #15182: Canceling authentication due to timeout aka Denial of Service Attack
Previous Message Alvaro Herrera 2018-08-28 15:40:57 Re: BUG #15357: Data goes to wrong partition in HASH Partitioned table

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-08-28 19:11:46 Re: More parallel pg_dump bogosities
Previous Message Fabien COELHO 2018-08-28 18:54:02 Re: csv format for psql