Re: RFC: array_agg() per SQL:200n

From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Neil Conway" <neilc(at)samurai(dot)com>
Cc: "pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: RFC: array_agg() per SQL:200n
Date: 2008-01-28 09:57:23
Message-ID: 87abmqs264.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"Neil Conway" <neilc(at)samurai(dot)com> writes:

> AFAIK the conclusion reached by the previous thread was that to be type
> safe, you'd need one distinct pseudotype per aggregate function, along
> with some way to let the planner distinguish this class of pseudotypes
> from other types (in order to apply the heuristic that functions like
> these are likely to consume more memory). You could identify this class
> by an additional column in pg_type, but I think we'd need a lot of
> machinery to do this properly (e.g. to allow these types to be created
> via SQL). I wonder if this isn't over-engineering: the simple approach
> originally followed by Stephen Frost was to declare the transition value
> as, say, int8, and just disallow the transition and final functions from
> being called outside an aggregate context. AFAIK this would be safe,
> although of course it is ugly.

The alternative is to use the regular array type and have the implementation
of it have some magic behind the scenes.

I was already thinking we might need some magic like this for read-only cases
like:

select * where i in array[1,3,5,...]

or

for i in 1..n
var = arrayvar[i]
...
end

Both of these are O(n^2) (assuming the size of the array and the number of
loop iterations are both n). Each array IN scan or index lookup is O(n).

These cases might be easier to deal with, my idea was to memoize the array
contents in a hash data structure referenced by the parse tree fnextra
pointer. The array functions would check their function call site's fnextra
pointer to see if the array has previously been cached in the more efficient
form and is the same array and then use either hash probes for the IN case or
a C datum array for the latter case.

Could the same be done by the aggregate call site where the aggregate's type
is a plain anyarray like normal, but the array_accum call would look at the
call site and stash the actual contents there in a linked list or tuplesort?
The actual anyarray data type would just have a flag saying "the data's over
there".

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's On-Demand Production Tuning

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Florian Weimer 2008-01-28 10:37:03 Re: plperl: Documentation on BYTEA decoding is wrong
Previous Message Russell Smith 2008-01-28 08:59:18 Re: Proposed patch: synchronized_scanning GUC variable