Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From: Florian Pflug <fgp(at)phlo(dot)org>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Subject: Re: [PATCH] Negative Transition Aggregate Functions (WIP)
Date: 2014-01-21 11:46:59
Message-ID: EEEF4DC0-81B0-4EFB-9B6E-FAA507AD1A9B@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Jan21, 2014, at 10:53 , David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> It came to me that it might be possible to implement inverse transitions
> for floating point aggregates by just detecting if precision has been
> lost during forward transitions.
>
> I've written the test to do this as:
>
> IF state.value + value = state.value AND value <> 0 THEN
> newstate.precision_lost := true;
> newstate.value := state.value;
> ELSE
> newstate.precision_lost := false;
> newstate.value := state.value + value;
> END IF;
>
>
> The inverse transition function checks the precision_lost and if it's true it
> returns NULL. The core code is now implemented (thanks to Florian) to
> re-aggregate when NULL is returned from the inverse transition function.

That's not sufficient, I fear. You can lose all significant digits of the value
and still have precision_lost = false afterwards. Try summing over 1e16, 1.01.
"SELECT 1e16::float8 + 1.01::float8 = 1e16::float8" returns FALSE, yet
"SELECT 1e16::float8 + 1.01::float8 - 1e16::float8" returns "2" where "1.01"
would have been correct. That's still too much precision loss.

I'm quite certain that the general idea has merit, but the actual
invertibility condition is going to be more involved. If you want to play with
this, I think the first step has to be to find a set of guarantees that
SUM(float) is supposed to meet. Currently, SUM(float) guarantees that if the
summands all have the same sign, the error is bound by C * N, where C is some
(machine-specific?) constant (about 1e-15 or so), and N is the number of input
rows. Or at least so I think from looking at SUMs over floats in descending
order, which I guess is the worst case. You could, for example, try to see if
you can find a invertibility conditions which guarantees the same, but allows
C to be larger. That would put a bound on the number of digits lost by the new
SUM(float) compared to the old one.

I don't have high hopes for this getting int 9.4, though.

> If it seems sound enough, then I may implement it in C to see how much
> overhead it adds to forward aggregation for floating point types, but even
> if it did add too much overhead to forward aggregation it might be worth
> allowing aggregates to have 2 forward transition functions and if the 2nd
> one exists then it could be used in windowing functions where the frame
> does not have "unbounded following".

I don't think adding yet another type of aggregation function is the
solution here.

best regards,
Florian Pflug

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message KONDO Mitsumasa 2014-01-21 11:54:54 Re: Add min and max execute statement time in pg_stat_statement
Previous Message Christian Convey 2014-01-21 11:43:54 alternative back-end block formats