targetlist functions proposals (was SETOF input parameters)

From: Joe Conway <mail(at)joeconway(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Hackers (PostgreSQL)" <pgsql-hackers(at)postgresql(dot)org>
Subject: targetlist functions proposals (was SETOF input parameters)
Date: 2003-01-03 01:10:03
Message-ID: 3E14E2EB.6040801@joeconway.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

(moved from PATCHES back to HACKERS)

Tom Lane wrote:
> Oh, you're thinking about the multi-column aspect of it, not the
> multi-row aspect. You really ought to keep those strictly separate;
> their design and implementation problems are quite different IMHO.
> I find it quite confusing to refer to both cases as "SRFs".

[...snip...]

> Before that, though, you'd better put forward a workable user interface
> for this; I'd wonder in particular what the names of the expanded-out
> columns will be, and whether they'd be accessible from places that can
> normally see output column names (such as ORDER BY). And what if a
> multi-column function appears in the targetlist of a sub-SELECT?

I've put some thought into *two* proposals for how targetlist functions should
behave -- one for a function that returns multiple rows, and one for a
function that returns multiple columns. The need for this was highlighted
recently when I submitted a proposal for array utility functions; see:
http://archives.postgresql.org/pgsql-hackers/2002-12/msg00461.php

At this point I don't have a clear idea how the latter would be implemented
(or if it even *can be* implemented with reasonable effort), but I wanted to
try to get agreement on the interface behavior before getting too caught up in
how to make it work. I think the former is reasonably straightforward (but
could well be wrong).

This is fairly long, so if you're not interested please delete now and accept
my apologies :-)

Proposals are below. Thoughts?

Thanks,

Joe

=================================================================
User interface proposal for multi-row function targetlist entries
=================================================================
1. Only one targetlist entry may return a set.
2. Each targetlist item (other than the set returning one) is
repeated for each item in the returned set.

Examples illustrating the need (these work on cvs HEAD):

CREATE TABLE bar(f1 int, f2 text, f3 int);
INSERT INTO bar VALUES(1, 'Hello', 42);
INSERT INTO bar VALUES(2, 'Happy', 45);

CREATE TABLE foo(a int, b text);
INSERT INTO foo VALUES(42, 'World');
INSERT INTO foo VALUES(42, 'Everyone');
INSERT INTO foo VALUES(45, 'Birthday');
INSERT INTO foo VALUES(45, 'New Year');

CREATE OR REPLACE FUNCTION getfoo(int) RETURNS SETOF text AS '
SELECT b FROM foo WHERE a = $1
' language 'sql';

regression=# SELECT f1, f2, getfoo(f3) AS f4 FROM bar;
f1 | f2 | f4
----+-------+----------
1 | Hello | World
1 | Hello | Everyone
2 | Happy | Birthday
2 | Happy | New Year
(4 rows)

Note that this is exatly how things currently work, i.e. there
is no restriction to the number of set returning targetlist entries.
This lack of restriction leads to strange and unexpected results (at
least IMHO). Continuing the example:

CREATE TABLE foo2(a int, b text);
INSERT INTO foo2 VALUES(42, '!!!!');
INSERT INTO foo2 VALUES(42, '????');
INSERT INTO foo2 VALUES(42, '####');
INSERT INTO foo2 VALUES(45, '$$$$');

CREATE OR REPLACE FUNCTION getfoo2(int) RETURNS SETOF text AS '
SELECT b FROM foo2 WHERE a = $1
' language 'sql';

Now, what *should* the following return if we allow multiple set
returning functions in the targetlist? Here's what it currently
does:

regression=# SELECT f1, f2, getfoo(f3) AS f4, getfoo2(f3) AS f5 FROM bar;
f1 | f2 | f4 | f5
----+-------+----------+------
1 | Hello | World | !!!!
1 | Hello | Everyone | ????
1 | Hello | World | ####
1 | Hello | Everyone | !!!!
1 | Hello | World | ????
1 | Hello | Everyone | ####
2 | Happy | Birthday | $$$$
2 | Happy | New Year | $$$$
(8 rows)

Not very useful as there is no way to prevent the apparent cartesian
join. But now try:

TRUNCATE TABLE foo2;
INSERT INTO foo2 VALUES(42, '!!!!');
INSERT INTO foo2 VALUES(42, '????');
INSERT INTO foo2 VALUES(45, '####');
INSERT INTO foo2 VALUES(45, '$$$$');

regression=# SELECT f1, f2, getfoo(f3) AS f4, getfoo2(f3) AS f5 FROM bar;
f1 | f2 | f4 | f5
----+-------+----------+------
1 | Hello | World | !!!!
1 | Hello | Everyone | ????
2 | Happy | Birthday | ####
2 | Happy | New Year | $$$$
(4 rows)

Hmmm, what happened to that cartesian join?

Under the proposal the two previous scenarios are disallowed with an ERROR.

============================================================================
User interface proposal for multi-column function targetlist entries
============================================================================
1. One, or more, targetlist entries may be a multi-column (composite) type.
2. For functions declared to return a named composite type, the
column names and types are as prescribed by the type unless overridden
in an alias definition.
3. For functions declared to return a "record" type, a column
definition list would be required as an alias at runtime.
4. Any alias provided for a composite returning function must match
the number of columns returned, and types if provided.
5. The composite function column names would be accessible from
places that can normally see output column names (such as ORDER BY).
6. When a composite function appears in the targetlist of a sub-SELECT,
the function's columns should be available outside the sub-SELECT in
the same manner as the other targetlist entries in the sub-SELECT.

Examples (these are all contrived):

CREATE TABLE bar(f1 int, f2 int);
INSERT INTO bar VALUES(1, 2);
INSERT INTO bar VALUES(2, 3);

CREATE TABLE foo(a int, b text);
INSERT INTO foo VALUES(1, 'a');
INSERT INTO foo VALUES(2, 'b');
INSERT INTO foo VALUES(3, 'c');
INSERT INTO foo VALUES(4, 'd');

CREATE OR REPLACE FUNCTION getfoo(int) RETURNS SETOF foo AS '
SELECT * FROM foo WHERE a = $1
' language 'sql';

SELECT f1, getfoo(f2) AS f3 FROM bar;
ERROR: function getfoo alias does not match its declared type

SELECT f1, getfoo(f2) AS (f3, f4) FROM bar;
f1 | f3 | f4
----+----+-----
1 | 2 | b
2 | 3 | c
(2 rows)

SELECT f1, getfoo(f2) AS (f3 int, f4 text) FROM bar;
f1 | f3 | f4
----+----+-----
1 | 2 | b
2 | 3 | c
(2 rows)

SELECT f1, getfoo(f2) FROM bar;
f1 | a | b
----+---+-----
1 | 2 | b
2 | 3 | c
(2 rows)

DROP FUNCTION getfoo(int);
CREATE OR REPLACE FUNCTION getfoo(int) RETURNS SETOF record AS '
SELECT * FROM foo WHERE a = $1
' language 'sql';

SELECT f1, getfoo(f2) AS f3 FROM bar;
ERROR: A column definition list is required for functions returning RECORD

SELECT f1, getfoo(f2) AS (f3 int, f4 text) FROM bar;
f1 | f3 | f4
----+----+-----
1 | 2 | b
2 | 3 | c
(2 rows)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Florian Weimer 2003-01-03 09:57:01 Re: why was libpq.so's version number bumped?
Previous Message Ross J. Reedstrom 2003-01-03 00:42:09 Re: Upgrading rant.

Browse pgsql-patches by date

  From Date Subject
Next Message Patric Bechtel 2003-01-03 15:20:01 New Patch for streaming result sets, now Nicks last patch incorporated
Previous Message Rod Taylor 2003-01-02 23:27:10 Re: ALTER TABLE .. SET WITH / WITHOUT OIDS