Re: proposal: array utility functions phase 1

From: Joe Conway <mail(at)joeconway(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: proposal: array utility functions phase 1
Date: 2002-12-10 03:48:13
Message-ID: 3DF563FD.8040601@joeconway.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:
> This seems fairly unworkable to me as-is. By definition, WHERE selects
> from a cross-product of the FROM tables; to make the above do what you
> want, you'd have to break that fundamental semantics. The semantics of
> explicit JOIN cases would be broken too.
>
> What we need is some kind of explicit multi-level SELECT operation.
> Perhaps it would help to think about the analogy of aggregates of
> aggregate functions, which are impossible to express properly in a
> single SELECT but work nicely given subselect-in-FROM.
> Subselect-in-FROM doesn't seem to get this job done though.
>
> Right offhand I don't see any reasonable syntax other than
> function-in-the-SELECT-list, which shoots us right back into the
> messinesses of the Berkeley implementation. However, we do now have the
> precedent of the table-function AS clause. Does it help any to do
> something like
>
> SELECT grosysid, array_values(grolist) AS (array_index,member_id)
> FROM pg_group where groname = 'g2';

After further thought, and ignoring the difficulty of implementation, what
seems ideal is to be able to specify 'setof <datatype>' or 'setof
<composite-type>' as an input to the function, and fire the function once for
each row of the input. Basically, allow anything that now qualifies as a FROM
item -- a table reference, a subselect with AS clause, another table function,
or maybe even a join clause. Some (totally contrived) examples of how it would
look:

create table foo1(f1 int, f2 text);
insert into foo1 values(1,'a');
insert into foo1 values(2,'b');
insert into foo1 values(3,'c');

create table foo2(f1 int, f2 text);
insert into foo2 values(1,'w');
insert into foo2 values(1,'x');
insert into foo2 values(2,'y');
insert into foo2 values(2,'z');

create function funcfoo1(setof foo1) returns setof foo2 as 'select * from foo2
where foo2.f1 = $1.f1' language 'sql';
select * from funcfoo1(foo1);
f1 f2
----+-----
1 | w
1 | x
2 | y
2 | z

select * from funcfoo1((select * from foo1 where f1=1) as t);
f1 f2
----+-----
1 | w
1 | x

What do you think?

> (Again you could wrap this in an outer SELECT to transform the
> member_ids to member_names.)
>
> The real problem with the Berkeley approach shows up when you consider
> what happens with multiple table functions called in a single SELECT.
> The code we currently have produces the cross-product of the implied
> rows (or at least it tries to, I seem to recall that it doesn't
> necessarily get it right). That's pretty unpleasant, and though you can
> filter the rows in an outer SELECT, there's no way to optimize the
> implementation into a smarter-than-nested-loop join.

What if there was a way to declare that a table function returns sorted
results, and on which column(s)?

> It seems like somehow we need a level of FROM/WHERE producing some base
> rows, and then a set of table function calls to apply to each of the
> base rows, and then another level of WHERE to filter the results of the
> function calls (in particular to provide join conditions to identify
> which rows to match up in the function outputs). I don't see any way to
> do this without inventing new SELECT clauses out of whole cloth
> ... unless SQL99's WITH clause helps, but I don't think it does ...

Is this still needed given my approach above?

>>How ugly/difficult would it be to allow the planner to interrogate the
>>function and let the function report back a tupledesc based on the actual
>>runtime input parameters?
>
>
> Parse-time, not run-time. It could be done --- IIRC, the auxiliary
> "function info" call we introduced in the V1 fmgr protocol was
> deliberately designed to allow expansion in this sort of direction.
> But it would have to take a tupledesc (or some similar static
> description) and return another one.

Nice! I'll dig in to that a bit.

Thanks,

Joe

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Nate Sommer 2002-12-10 04:07:52 tuple descriptors?
Previous Message Bruce Momjian 2002-12-10 03:23:28 Re: nested transactions

Browse pgsql-patches by date

  From Date Subject
Next Message Alvaro Herrera 2002-12-10 04:55:38 Re: psql's \d commands --- end of the line for
Previous Message Bruce Momjian 2002-12-10 01:57:44 Re: minor doc improvements