Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Florian Pflug <fgp(at)phlo(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Subject: Re: [PATCH] Negative Transition Aggregate Functions (WIP)
Date: 2014-01-23 00:17:43
Message-ID: CAApHDvouryJ8C2im48Zr7dqoBbvvkg6u+6OmxWHmn++6KL1EZA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 22, 2014 at 12:46 AM, Florian Pflug <fgp(at)phlo(dot)org> wrote:

> On Jan21, 2014, at 10:53 , David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> > It came to me that it might be possible to implement inverse transitions
> > for floating point aggregates by just detecting if precision has been
> > lost during forward transitions.
> >
> > I've written the test to do this as:
> >
> > IF state.value + value = state.value AND value <> 0 THEN
> > newstate.precision_lost := true;
> > newstate.value := state.value;
> > ELSE
> > newstate.precision_lost := false;
> > newstate.value := state.value + value;
> > END IF;
> >
> >
> > The inverse transition function checks the precision_lost and if it's
> true it
> > returns NULL. The core code is now implemented (thanks to Florian) to
> > re-aggregate when NULL is returned from the inverse transition function.
>
> That's not sufficient, I fear. You can lose all significant digits of the
> value
> and still have precision_lost = false afterwards. Try summing over 1e16,
> 1.01.
> "SELECT 1e16::float8 + 1.01::float8 = 1e16::float8" returns FALSE, yet
> "SELECT 1e16::float8 + 1.01::float8 - 1e16::float8" returns "2" where
> "1.01"
> would have been correct. That's still too much precision loss.
>
> I'm quite certain that the general idea has merit, but the actual
> invertibility condition is going to be more involved. If you want to play
> with
> this, I think the first step has to be to find a set of guarantees that
> SUM(float) is supposed to meet. Currently, SUM(float) guarantees that if
> the
> summands all have the same sign, the error is bound by C * N, where C is
> some
> (machine-specific?) constant (about 1e-15 or so), and N is the number of
> input
> rows. Or at least so I think from looking at SUMs over floats in descending
> order, which I guess is the worst case. You could, for example, try to see
> if
> you can find a invertibility conditions which guarantees the same, but
> allows
> C to be larger. That would put a bound on the number of digits lost by the
> new
> SUM(float) compared to the old one.
>
>
I guess if the case is that normal (non-window) sum(float) results are
undefined unless you add an order by to the aggregate then I guess there is
no possible logic to put in for inverse transitions that will make them
behave the same as an undefined behaviour.

> I don't have high hopes for this getting int 9.4, though.
>
>
Yeah I'm going to drop it. I need to focus on documents and performance
testing.

> > If it seems sound enough, then I may implement it in C to see how much
> > overhead it adds to forward aggregation for floating point types, but
> even
> > if it did add too much overhead to forward aggregation it might be worth
> > allowing aggregates to have 2 forward transition functions and if the 2nd
> > one exists then it could be used in windowing functions where the frame
> > does not have "unbounded following".
>
> I don't think adding yet another type of aggregation function is the
> solution here.
>
>
Why not?

I thought, if in the cases where we need to burden the forward transition
functions with extra code to make inverse transitions possible, then why
not have an extra transition function so that it does not slow down normal
aggregation?

My testing of sum(numeric) last night does not show any slow down by that
extra dscale tracking code that was added, but if it did then I'd be
thinking seriously about 2 forward transition functions to get around the
problem. I don't understand what would be wrong with that. The core code
could just make use of the 2nd function if it existed and inverse
transitions were thought to be required. i.e in a window context and does
not have UNBOUNDED PRECEDING.

Regards

David Rowley

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim Nasby 2014-01-23 00:19:25 Re: Hard limit on WAL space used (because PANIC sucks)
Previous Message David Rowley 2014-01-23 00:07:29 Re: [PATCH] Negative Transition Aggregate Functions (WIP)