Skip site navigation (1) Skip section navigation (2)

Re: array_accum aggregate

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgresql(dot)org, Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Neil Conway <neilc(at)samurai(dot)com>, pgsql-patches(at)postgresql(dot)org, Joe Conway <mail(at)joeconway(dot)com>
Subject: Re: array_accum aggregate
Date: 2006-10-12 22:45:22
Message-ID: 20294.1160693122@sss.pgh.pa.us (view raw or flat)
Thread:
Lists: pgsql-hackerspgsql-patches
Stephen Frost <sfrost(at)snowman(dot)net> writes:
> Another alternative would be to provide a seperate area for each
> aggregate to put any other information it needs.

I'm not convinced that that's necessary --- the cases we have at hand
suggest that the transition function is perfectly capable of doing the
storage management it wants.  The problem is how to declare to CREATE
AGGREGATE that we're using a transition function of this kind rather
than the "stupid" functions it expects.  When the function is doing its
own storage management, we'd really rather that nodeAgg.c stayed out of
the way and didn't try to do any datum copying at all; having it copy a
placeholder bytea or anyarray or whatever is really a waste of cycles,
not to mention obscuring what is going on.  If nodeAgg just provided a
pass-by-value Datum, which the transition function could use to store a
pointer to storage it's handling, things would be a lot cleaner.

After a little bit of thought I'm tempted to propose that we handle this
by inventing a new pseudotype called something like "aggregate_state",
which'd be declared in the catalogs as pass-by-value, thereby
suppressing useless copying activity in nodeAgg.c.  You'd declare the
aggregate as having stype = aggregate_state, and the transition function
would have signature
	sfunc(aggregate_state, ... aggregate-input-type(s) ...)
		returns aggregate_state
and the final function of course
	ffunc(aggregate_state) returns aggregate-result-type

aggregate_state would have no other uses in the system, and its input
and output functions would raise an error, so type safety is assured
--- there would be no way to call either the sfunc or ffunc "manually",
except by passing a NULL value, which should be safe because that's what
they'd expect as the aggregate initial condition.

One advantage of doing it this way is that the planner could be taught
to recognize aggregates with stype = aggregate_state specially, and make
allowance for the fact that they'll use more workspace than meets the
eye.  If we don't have something like this then the planner is likely to
try to use hash aggregation in scenarios where it'd be absolutely fatal
to do so.  I'm not sure whether we'd want to completely forbid hash
aggregation when any stype = aggregate_state is present, but for sure we
want to assume that there's some pretty large amount of per-aggregate
state we don't know about.

			regards, tom lane

In response to

Responses

pgsql-hackers by date

Next:From: Tom LaneDate: 2006-10-12 22:50:55
Subject: Re: [COMMITTERS] pgsql: Stamp 7.3.16.
Previous:From: Bruce MomjianDate: 2006-10-12 22:41:28
Subject: Re: ./configure argument checking

pgsql-patches by date

Next:From: Tom LaneDate: 2006-10-12 22:58:52
Subject: Re: array_accum aggregate
Previous:From: Stephen FrostDate: 2006-10-12 20:28:41
Subject: Re: array_accum aggregate

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group