Re: UNNEST with multiple args, and TABLE with multiple funcs

From: Boszormenyi Zoltan <zb(at)cybertec(dot)at>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: PG Hackers <pgsql-hackers(at)postgresql(dot)org>, Greg Stark <stark(at)mit(dot)edu>, David Fetter <david(at)fetter(dot)org>, Josh Berkus <josh(at)agliodbs(dot)com>
Subject: Re: UNNEST with multiple args, and TABLE with multiple funcs
Date: 2013-08-19 16:23:44
Message-ID: 52124690.8050309@cybertec.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

2013-08-13 15:54 keltezéssel, Andrew Gierth írta:
> Summary:
>
> This patch implements a method for expanding multiple SRFs in parallel
> that does not have the surprising LCM behaviour of SRFs-in-select-list.
> (Functions returning fewer rows are padded with nulls instead.)
>
> It then uses this method combined with a parse-time hack to implement
> the (intended to be) spec-conforming behaviour of UNNEST with multiple
> parameters, including flattening of composite results.
>
> The upshot is that given a table like this:
>
> postgres=# select * from t1;
> a | b | c
> ---------------+-------------------+----------------------------------------------
> {11,12,13} | {wombat} |
> {5,10} | {foo,bar} | {"(123,xyzzy)","(456,plugh)","(789,plover)"}
> {21,31,41,51} | {fred,jim,sheila} | {"(111,xyzzy)","(222,plugh)"}
> (3 rows)
>
> (where column "c" is an array of a composite type with 2 cols, "x" and "y")
>
> You can do this:
>
> postgres=# select u.* from t1, unnest(a,b,c) with ordinality as u;
> ?column? | ?column? | x | y | ordinality
> ----------+----------+-----+--------+------------
> 11 | wombat | | | 1
> 12 | | | | 2
> 13 | | | | 3
> 5 | foo | 123 | xyzzy | 1
> 10 | bar | 456 | plugh | 2
> | | 789 | plover | 3
> 21 | fred | 111 | xyzzy | 1
> 31 | jim | 222 | plugh | 2
> 41 | sheila | | | 3
> 51 | | | | 4
> (10 rows)
>
> Or for an example of general combination of functions:
>
> postgres=# select * from table(generate_series(10,20,5), unnest(array['fred','jim']));
> ?column? | ?column?
> ----------+----------
> 10 | fred
> 15 | jim
> 20 |
> (3 rows)
>
> Implementation Details:
>
> The spec syntax for table function calls, <table function derived table>
> in <table reference>, looks like TABLE(func(args...)) AS ...
>
> This patch implements that, plus an extension: it allows multiple
> functions, TABLE(func1(...), func2(...), func3(...)) [WITH ORDINALITY]
> and defines this as meaning that the functions are to be evaluated in
> parallel.
>
> This is implemented by changing RangeFunction, function RTEs, and the
> FunctionScan node to take lists of function calls rather than a single
> function. The calling convention for SRFs is completely unchanged; each
> function returns its own rows (or a tuplestore in materialize mode) just
> as before, and FunctionScan combines the results into a single output
> tuple (keeping track of which functions are exhausted in order to
> correctly fill in nulls on a backwards scan).
>
> Then, a hack in the parser converts unnest(...) appearing as a
> func_table (and only there) into a list of unnest() calls, one for each
> parameter. So
>
> select ... from unnest(a,b,c)
>
> is converted to
>
> select ... from TABLE(unnest(a),unnest(b),unnest(c))
>
> and if unnest appears as part of an existing list inside TABLE(), it's
> expanded to multiple entries there too.
>
> This parser hackery is of course somewhat ugly. But given the objective
> of implementing the spec's unnest syntax, it seems to be the least ugly
> of the possible approaches. (The hard part of doing it any other way
> would be generating the description of the result type; composite array
> parameters expand into multiple result columns.)

Harder maybe but it may still be cleaner in the long run.

> Overall, it's my intention here to remove as many as feasible of the old
> reasons why one might use an SRF in the select list.

Indeed, it's a big nail in the coffin for SRFs-in-targetlist. Having
WITH ORDINALITY and this feature, I would vote for removing
SRF-in-targetlist and call the release PostgreSQL 10.0.

> This should also
> address the points that Josh brought up in discussion of ORDINALITY
> regarding use of SRF-in-select to unnest multiple arrays.
>
> (As a side issue, this patch also sets up pathkeys for ordinality along
> the lines of a patch I suggested to Greg a while back in response to
> his.)
>
> Current patch status:
>
> This is a first working cut: no docs, no tests, not enough comments, the
> deparse logic probably needs more work (it deparses correctly but the
> formatting may be suboptimal). However all the functionality is believed
> to be in place.

With this last paragraph in mind, I am trying a little review.

* Is the patch in a patch format which has context? (eg: context diff format)

Yes.

* Does it apply cleanly to the current git master?

Applies with some offsets on a few files but without fuzz.

* Does it include reasonable tests, necessary doc patches, etc?

No, as told by the patch author.

* Does the patch actually implement what it's supposed to do?

Yes.

* Do we want that?

Yes.

* Do we already have it?

No.

* Does it follow SQL spec, or the community-agreed behavior?

The SQL spec says these:

In 7.6 <table reference>

<table primary> ::=
...
| <table function derived table> [ AS ] <correlation name> [ <left paren> <derived column
list> <right paren> ]

also later in the same section:

<table function derived table> ::=
TABLE <left paren> <collection value expression> <right paren>

In 6.26 <value expression>

<collection value expression> ::=
<array value expression> | <multiset value expression>

In 6.36 <array value expression>

<array value expression> ::= <array concatenation> | <array primary>
<array concatenation> ::= <array value expression 1> <concatenation operator> <array primary>
<array value expression 1> ::= <array value expression>
<array primary> ::= <array value function> | <value expression primary>

6.3 <value expression primary>

<value expression primary> ::=
<parenthesized value expression>
| <nonparenthesized value expression primary>

<parenthesized value expression> ::= <left paren> <value expression> <right paren>

<nonparenthesized value expression primary> ::=
<unsigned value specification>
| <column reference>
| <set function specification>
| <window function>
| <nested window function>
| <scalar subquery>
| <case expression>
| <cast specification>
| <field reference>
| <subtype treatment>
| <method invocation>
| <static method invocation>
| <new specification>
| <attribute or method reference>
| <reference resolution>
| <collection value constructor>
| <array element reference>
| <multiset element reference>
| <next value expression>
| <routine invocation>

collection value constructor> ::=
| <array value constructor>
| <multiset value constructor>

So, the FROM TABLE(...) AS (...) syntax is a big can of worms and
I haven't even quoted <multiset value expression>.

As far as I can tell, these should also be allowed but isn't:

zozo=# select * from table('a'::text) as x;
ERROR: syntax error at or near "'a'"
LINE 1: select * from table('a'::text) as x;
^
zozo=# select x.* from t1, table(t1.a) as x;
ERROR: syntax error at or near ")"
LINE 1: select x.* from t1, table(t1.a) as x;
^
zozo=# select x.* from table((6)) as x(a int4);
ERROR: syntax error at or near "("
LINE 1: select x.* from table((6)) as x(a int4);
^
zozo=# select x.* from table(values (6)) as x(a int4);
ERROR: syntax error at or near "("
LINE 1: select x.* from table(values (6)) as x(a int4);
^
zozo=# select x.* from table(values(6)) as x(a int4);
ERROR: syntax error at or near "("
LINE 1: select x.* from table(values(6)) as x(a int4);
^

What the patch implements is only the last choice for
<nonparenthesized value expression primary>: <routine invocation>

When you add documentation, it would be nice to mention it.

Also, the grammar extension is a start for adding all the other
standard choices for TABLE().

* Does it include pg_dump support (if applicable)?

n/a

* Are there dangers?

I can't see any. 8-)

* Have all the bases been covered?

My previous comments about the TABLE() syntax says it all.
You can interpret it either way. :-)

* Does the feature work as advertised?

Yes.

* Are there corner cases the author has failed to consider?

I don't know.

* Are there any assertion failures or crashes?

No.

* Does the patch slow down simple tests?

No.

* If it claims to improve performance, does it?

It certainly improves writing queries, as functions inside
unnest() get processed in one scan.

* Does it slow down other things?

I don't think so.

* Does it follow the project coding guidelines?

Yes.

* Are there portability issues?

No.

* Will it work on Windows/BSD etc?

It should, the code uses standard internal PostgreSQL APIs
and extends them. No new system call.

* Are the comments sufficient and accurate?

According to the author, no.

* Does it do what it says, correctly?

Yes.

* Does it produce compiler warnings?

No.

* Can you make it crash?

No.

* Is everything done in a way that fits together coherently with other features/modules?

I think so

* Are there interdependencies that can cause problems?

I don't know.

Best regards,
Zoltán Böszörményi

--
----------------------------------
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
http://www.postgresql.at/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message 'Bruce Momjian' 2013-08-19 16:26:35 Re: 9.3 release notes suggestions
Previous Message Bruce Momjian 2013-08-19 16:15:43 Re: Feature Request on Extensions