|From:||Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>|
|To:||Erich Schubert <erich(at)debian(dot)org>|
|Subject:||Re: BUG #15307: Low numerical precision of (Co-) Variance|
|Views:||Raw Message | Whole Thread | Download mbox|
On 9 August 2018 at 12:02, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> wrote:
> ... the YC algorithm is probably preferable. For the record,
> attached are both versions that I tried.
Here is an updated, more complete patch, based on the YC algorithm,
with updated regression tests for the September commitfest.
All the existing tests pass unchanged, although I'm somewhat surprised
that the current tests pass with no platform variations. I've added
new tests to cover infinity/NaN handling, parallel aggregation and
confirm the improved accuracy with large offsets. The latter tests
operate well within in the limits of double precision arithmetic, so I
wouldn't expect any platform variation, but that's difficult
guarantee. If there are problems, it may be necessary to round the
Notable changes from the previous patch:
I have rewritten the overflow checks in the accum functions to be
clearer and more efficient which, if anything, makes these aggregates
now slightly faster than HEAD. More importantly though, I've added
explicit code to force Sxx to be NaN if any input is infinite, which
the previous coding didn't guarantee. I think NaN is the right result
for quantities like variance, if any input value is infinite, since it
logically involves 'infinity minus infinity'. That's also consistent
with the current behaviour.
I have also made the aggregate combine functions SQL-callable to make
testing easier -- there was a bug in the previous version due to a
typo which meant that float8_regr_combine() was incorrect when N1 was
non-zero and N2 was zero. That situation is unlikely to happen in
practice, and difficult to provoke deliberately without manually
calling the combine function, which is why I didn't spot it before.
The new tests cover all code branches, and make it easier to see that
the combine functions are producing the correct results.
|Next Message||Tom Lane||2018-08-28 19:11:46||Re: More parallel pg_dump bogosities|
|Previous Message||Fabien COELHO||2018-08-28 18:54:02||Re: csv format for psql|
|Next Message||Michael Paquier||2018-08-29 01:34:41||Re: BUG #15182: Canceling authentication due to timeout aka Denial of Service Attack|
|Previous Message||Alvaro Herrera||2018-08-28 15:40:57||Re: BUG #15357: Data goes to wrong partition in HASH Partitioned table|