Re: Window-functions patch handling of aggregates

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Greg Stark <greg(dot)stark(at)enterprisedb(dot)com>
Cc: Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Window-functions patch handling of aggregates
Date: 2008-12-26 19:17:29
Message-ID: 14175.1230319049@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greg Stark <greg(dot)stark(at)enterprisedb(dot)com> writes:
> Yeah, it seems like adding a flag like iswindowable to aggregate
> functions is the safest option.

I agree with Hitoshi-san: that's passing information in the wrong
direction. The right direction is to make it visible to the called
function which context it's being called in, so that it can do the
right thing (or at worst, throw an error if it can't).

The current state of play is that the documented expectation for
aggregate functions is that they should do

if (fcinfo->context && IsA(fcinfo->context, AggState))
... optimized code for aggregate case ...
else
... support regular call, or throw error ...

However, a bit of grepping for AggState shows that this expectation
isn't met very well within the core distribution, let alone elsewhere.
There are about 10 aggregates that have optimizations like this, only
8 of which are playing by the rules --- the violators are:

tsearch2's tsa_rewrite_accum: just assumes it's been passed an AggState,
dumps core (or worse) if not
array_agg: Asserts it's been passed an AggState, dumps core if not

As submitted, Hitoshi's patch made the WindowAgg code pass its
WindowAggState to the aggregate functions, which at best would have the
result of disabling the internal optimizations of the aggregates, and
would result in a core dump in any aggregate that was taking a shortcut.
The working copy I have right now does this:

if (numaggs > 0)
{
/*
* Set up a mostly-dummy AggState to be passed to plain aggregates
* so they know they can optimize things. We don't supply any useful
* info except for the ID of the aggregate-lifetime context.
*/
winstate->aggstate = makeNode(AggState);
winstate->aggstate->aggcontext = winstate->wincontext;
}

and then passes the dummy AggState to plain aggregates, instead of
WindowAggState. This makes count(*) and most of the other aggs work
as desired, but it's not sufficient for array_agg because of that
release-the-data-structure-in-the-final-function optimization.

So the alternatives I see are:

1. Go back to Hitoshi's plan of passing WindowAggState to the
aggregates. This will require changing every one of the ten aggregates
in the core distro, as well as every third-party aggregate that has
a similar optimization; and we just have to keep our fingers crossed
that anyone who's taking a short-cut will fix their code before it
fails in the field.

2. Use an intermediate dummy AggState as I have in my version, but
document some convention for telling this from a "real" AggState
when needed. (Not hard, we just pick some field that would never be
zero in a real AggState and document testing that.) This is certainly
on the ugly side, but it would very substantially cut the number of
places that need changes. Only aggregates that are doing something
irreversible in their final-functions would need to be touched.

If we were working in a green field then #1 would clearly be the
preferable choice, but worrying about compatibility with existing
third-party aggregates is making me lean to #2. Comments?

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joshua D. Drake 2008-12-26 19:29:21 Re: Window-functions patch handling of aggregates
Previous Message Tom Lane 2008-12-26 17:48:39 Re: Unused include/storage/itempos.h