Quick Links

Re: BUG #15307: Low numerical precision of (Co-) Variance

From:	Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To:	Erich Schubert <erich(at)debian(dot)org>
Cc:	pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject:	Re: BUG #15307: Low numerical precision of (Co-) Variance
Date:	2018-08-28 19:05:13
Message-ID:	CAEZATCXRXn40RJjwfSz0FoJfHAX9Y6gwYB2HdL30KVE8ozwCyA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-bugs pgsql-hackers

On 9 August 2018 at 12:02, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> wrote:
> ... the YC algorithm is probably preferable. For the record,
> attached are both versions that I tried.
>

Here is an updated, more complete patch, based on the YC algorithm,
with updated regression tests for the September commitfest.

All the existing tests pass unchanged, although I'm somewhat surprised
that the current tests pass with no platform variations. I've added
new tests to cover infinity/NaN handling, parallel aggregation and
confirm the improved accuracy with large offsets. The latter tests
operate well within in the limits of double precision arithmetic, so I
wouldn't expect any platform variation, but that's difficult
guarantee. If there are problems, it may be necessary to round the
test results.

Notable changes from the previous patch:

I have rewritten the overflow checks in the accum functions to be
clearer and more efficient which, if anything, makes these aggregates
now slightly faster than HEAD. More importantly though, I've added
explicit code to force Sxx to be NaN if any input is infinite, which
the previous coding didn't guarantee. I think NaN is the right result
for quantities like variance, if any input value is infinite, since it
logically involves 'infinity minus infinity'. That's also consistent
with the current behaviour.

I have also made the aggregate combine functions SQL-callable to make
testing easier -- there was a bug in the previous version due to a
typo which meant that float8_regr_combine() was incorrect when N1 was
non-zero and N2 was zero. That situation is unlikely to happen in
practice, and difficult to provoke deliberately without manually
calling the combine function, which is why I didn't spot it before.
The new tests cover all code branches, and make it easier to see that
the combine functions are producing the correct results.

Regards,
Dean

Attachment	Content-Type	Size
implement-float-aggs-using-youngs-cramer-v2.patch	application/octet-stream	36.5 KB

In response to

Re: BUG #15307: Low numerical precision of (Co-) Variance at 2018-08-09 11:02:06 from Dean Rasheed

Responses

Re: BUG #15307: Low numerical precision of (Co-) Variance at 2018-09-27 05:12:23 from Madeleine Thompson

Browse pgsql-bugs by date

	From	Date	Subject
Next Message	Michael Paquier	2018-08-29 01:34:41	Re: BUG #15182: Canceling authentication due to timeout aka Denial of Service Attack
Previous Message	Alvaro Herrera	2018-08-28 15:40:57	Re: BUG #15357: Data goes to wrong partition in HASH Partitioned table

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2018-08-28 19:11:46	Re: More parallel pg_dump bogosities
Previous Message	Fabien COELHO	2018-08-28 18:54:02	Re: csv format for psql