Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To: Florian Pflug <fgp(at)phlo(dot)org>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Negative Transition Aggregate Functions (WIP)
Date: 2014-04-09 08:05:06
Message-ID: CAEZATCWjejvZjiq0qQ90ntOOBSz4i+7g=e20YOfiWE0pOguBbA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 8 April 2014 21:48, Florian Pflug <fgp(at)phlo(dot)org> wrote:
> On Apr7, 2014, at 17:41 , Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> wrote:
>> I've just finished reading through all the other patches, and they all
>> look OK to me. It's mostly straightforward stuff, so despite the size
>> it's hopefully all committable once the base patch goes in.
>
> Hm, I'm starting to have second thoughts about the minmax patch. The
> inverse transition functions for MIN and MAX have a non-trivial probability
> of failure - they trigger a rescan whenever the value that is removed isn't
> strictly smaller (respectively strictly larger) then the current maximum
> (respectively minimum). Thus, whenever that happens, we both call the
> inverse transition function *and* (since it fails) restart the aggregation.
>
> For windows based on ROWS, this isn't too bad - even if we fail every second
> time, we still avoid half the rescans, which should be a net win if the
> average window size is > 2.
>
> But for RANGE-based windows, more than one call of the inverse transition
> function must succeed for us to save anything, since we must successfully
> remove *all* peers to avoid one restart. This greatly increases the chance
> that using the inverse transition function hurts rather then helps - the
> situation is worse the larger the average number of peers is.
>

Argh, I hadn't really considered that case. I suppose any imperfect
inverse transition function has the potential to make performance
worse rather than better. But working out the likelihood of that isn't
necessarily straightforward.

It might be possible to include some sort of heuristic based on the
known information --- the number of rows P in the peer group about to
be removed vs the total number N of rows aggregated so far. If the
data were fairly random, then a quick back-of-the-envelope calculation
suggests that trying the inverse min/max functions would be worthwhile
on average if P were less than around 0.4N, but of course different
data distributions could make that much worse. Even a perfect inverse
transition function isn't going to be much use if P > N/2 (e.g.,
imagine a case where the peer groups decrease in size exponentially),
so perhaps we should be including such a check anyway.

That's also assuming that the cost of the inverse transition function
is about the same as the cost of the forward function, which is not
necessarily the case. Perhaps imperfect inverse transition functions
should be assigned a higher cost, and that should be factored into the
decision as to whether they are likely to be worth trying. All that
feels very speculative though, and I think it's too late to be
considering that for 9.4, so yes, let's leave out the min/max
aggregates for now.

> I've factored the BOOL_AND,BOOL_OR stuff out into a separate patch
> invtrans_bool - it previously was part of invtrans_minmax. Given the
> performance risk involved, I think that we probably shouldn't commit
> invtrans_minmax at this time. I should have brought this up earlier, but
> the issue had slipped my mind :-( Sorry for that.
>
>> I think that you're probably right that optimising
>> window_gettupleslot() to eliminate the O(n^2) behaviour can be left to
>> a later patch --- the existing performance benefits of this patch are
>> enough to justify its inclusion IMO. It would be nice to include the
>> trivial optimisation to window_gettupleslot() that I posted upthread,
>> since it gives such a big improvement for only a few lines of code
>> changed.
>
> Agreed, but since it's independent from the rest of the base patch,
> I kept it as a separate patch (invtrans_optimize) instead of merging it
> into the base patch. It should probably be committed separately too.
>

It would be good to commit at least the "base", "arith" and "optimize"
patches for 9.4. I think the "collecting" and "bool" patches are also
committable, but I also suspect that those aggregates are less well
used, so they could be considered lower priority.

>> If you post a new patch set, I'll mark it as ready for committer attention.
>
> New patch set is attached. The only difference to the previous one is that
> "Forward Transitions" and "Inverse Transitions" are now scaled with nloops,
> and that it includes your window_gettupleslot patch under the name
> invtrans_optimize.
>

OK, I'm marking this ready for committer attention, on the
understanding that that doesn't include the invtrans_minmax patch.

Regards,
Dean

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2014-04-09 08:21:26 Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)
Previous Message Nicolas Barbier 2014-04-09 08:01:07 Re: Proposal for Merge Join for Non '=' Operators