|From:||Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>|
|To:||Erich Schubert <erich(at)debian(dot)org>|
|Subject:||Re: BUG #15307: Low numerical precision of (Co-) Variance|
|Views:||Raw Message | Whole Thread | Download mbox | Resend email|
On 7 August 2018 at 12:58, Erich Schubert <erich(at)debian(dot)org> wrote:
> I am surprised that this is possible to do with arbitrary precision
> here, as for example with 8 data points, it will eventually involve a
> division by 7, which cannot be represented even with arbitrary (finite)
> precision floating point. But maybe just this division comes late enough
> (after the difference) that it just works before being converted to a float
> for the division.
Right, the division comes at the end in that case.
> Instead of Welford, I recommend to use the Youngs-Cramer approach. It is
> almost the same; roughly it is aggregating sum-X and sum((X-meanX)²)
> directly (whereas Welford updates mean-X, and sum((X-meanX)^2)). So the
> first aggregate is actually unchanged to the current approach.
> In my experiments, this was surprisingly much faster when the data was a
> double; explainable by CPU pipelining (see the Figure in the paper I
> linked - the division comes late, and the next step does not depend on the
> division, so pipelining can already process the next double without waiting
> for the division to finish).
> Youngs & Cramer original publication:
OK, thanks for the reference. That was very interesting.
It was actually the Knuth variant of the Welford algorithm that I was
playing with, but I have now tried out the Youngs-Cramer algorithm as
In terms of performance the YC algorithm was maybe 2-3% slower than
Welford, which was roughly on a par with the current schoolbook
algorithm in PostgreSQL, but there was some variability in those
results, depending on the exact query in question. One thing to bear
in mind is that the kind of micro-optimisations that will make a huge
difference when processing in-memory data in tight loops as in your
examples are not nearly as important in the PostgreSQL environment
where there is a significant per-tuple overhead in fetching data,
depending on the enclosing SQL query. So the odd cycle here and there
from a division isn't going to matter much in this context.
Both algorithms eliminate the main loss-of-precision problem that the
current schoolbook algorithm suffers from, which I think has to be the
primary aim of this. The YC paper suggests that the results from their
algorithm might be slightly more accurate (although maybe not enough
that we really care). What I do like about the YC algorithm is that it
guarantees the same result from avg(), whereas the Welford algorithm
will change avg() slightly, possibly making it very slightly less
So I concur, the YC algorithm is probably preferable. For the record,
attached are both versions that I tried.
> If Postgres could use parallel aggregation, let me know. I can assist with
> the proper aggregation of several partitions (both distributed, or
> multithreaded). Using multiple partitions can also slightly improve
> precision; but it is mostly interesting to process multiple chunks in
PostgreSQL isn't multithreaded, but it will parallelise large
aggregates by distributing the computation over a small number of
worker backend processes, and then combining the results. I did a
quick back-of-the-envelope calculation to see how to combine the
results, and in testing it seemed to work correctly, but it would be
worth having a second pair of eyes to validate that.
I'll add this to the next commitfest for consideration.
|Next Message||shaurya jain||2018-08-09 12:16:08||Re: BUG #15319: pg_stat_user_tables not showing last vacuum date|
|Previous Message||Sergei Kornilov||2018-08-09 08:07:52||Re: BUG #15319: pg_stat_user_tables not showing last vacuum date|
|Next Message||Raúl Marín Rodríguez||2018-08-09 11:39:06||Do all rows from a set have the same TupleDesc?|
|Previous Message||Michael Paquier||2018-08-09 10:28:33||Re: Improve behavior of concurrent TRUNCATE|