Re: Aggregate function API versus grouping sets

From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Cc: Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Subject: Re: Aggregate function API versus grouping sets
Date: 2014-07-02 17:52:04
Message-ID: 87egy3itnx.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>>>>> "Tom" == Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

>> Doing rollup via GroupAggregate by maintaining multiple transition
>> values at a time (one per grouping set) means that the transfn is
>> being called interleaved for transition values in different
>> contexts. So the question becomes: is it wrong for the transition
>> function to assume that aggcontext can be cached this way, or is
>> it necessary for the executor to use a separate flinfo for each
>> concurrent grouping set?

Tom> Hm. We could probably move gcontext into the per-group data.
Tom> I'm not sure though if there are any other dependencies there on
Tom> the groups being evaluated serially. More generally, I wonder
Tom> whether user-defined aggregates are likely to be making
Tom> assumptions that will be broken by this.

The thing is that almost everything _except_ aggcontext and
AggGetPerAggEContext that a transfn might want to hang off fn_extra
really is going to be constant across all groups.

The big question, as you say, is whether this is going to be an issue
for existing user-defined aggregates.

>> Since it seems that the cleanup callback is the sole reason for
>> this function to exist, is it acceptable to remove any implication
>> that the context returned is the overall per-output-tuple one, and
>> simply state that it is a context whose cleanup functions are
>> called at the appropriate time before aggcontext is reset?

Tom> Well, I think the implication is that it's the econtext in which
Tom> the result of the aggregation will be used.

In the rollup case, though, it does not seem reasonable to have
multiple result-tuple econtexts (that would significantly complicate
the projection of rows, the handling of rescans, etc.).

Tom> Another approach would be to remove AggGetPerAggEContext as such
Tom> from the API altogether, and instead offer an interface that
Tom> says "register an aggregate cleanup callback", leaving it to the
Tom> agg/window core code to figure out which context to hang that
Tom> on. I had thought that exposing this econtext and the
Tom> per-input-tuple one would provide useful generality, but maybe
Tom> we should rethink that.

Works for me.

--
Andrew (irc:RhodiumToad)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Janes 2014-07-02 17:52:53 Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
Previous Message Robert Haas 2014-07-02 17:08:55 Re: /proc/self/oom_adj is deprecated in newer Linux kernels