Re: Common Table Expressions (WITH RECURSIVE) patch

From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Common Table Expressions (WITH RECURSIVE) patch
Date: 2008-09-08 17:08:55
Message-ID: 871vzumuoo.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>>>>> "Jeff" == Jeff Davis <pgsql(at)j-davis(dot)com> writes:

Jeff> * Mutual Recursion:

This limitation isn't at all uncommon in other implementations; DB2
docs for example say:

"If more than one common table expression is defined in the same
statement, cyclic references between the common table expressions are
not permitted. A cyclic reference occurs when two common table
expressions dt1 and dt2 are created such that dt1 refers to dt2 and
dt2 refers to dt1."

http://publib.boulder.ibm.com/infocenter/iadthelp/v7r0/index.jsp?topic=/com.ibm.etools.iseries.langref2.doc/rbafzmst295.htm

MSSQL's documentation is less clear, but it also appears not to allow
mutual recursion (not allowing forward references to CTEs).

"A CTE can reference itself and previously defined CTEs in the same
WITH clause. Forward referencing is not allowed."

http://msdn.microsoft.com/en-us/library/ms175972.aspx

Jeff> * RHS only:

Jeff> with recursive
Jeff> foo(i) as (select i+1 from foo where i < 10 union all values(1))
Jeff> select * from foo;
Jeff> ERROR: Left hand side of UNION ALL must be a non-recursive term in a
Jeff> recursive query

Jeff> The standard does not require that the recursive term be on
Jeff> the RHS.

Again, the standard may not, but existing implementations do:

MSSQL:

"The recursive CTE definition must contain at least two CTE query
definitions, an anchor member and a recursive member. Multiple anchor
members and recursive members can be defined; however, all anchor
member query definitions must be put before the first recursive member
definition. All CTE query definitions are anchor members unless they
reference the CTE itself. "

DB2:

"The following restrictions apply to a recursive common-table-expression:
* A list of column-names must be specified following the
table-identifier.
* The UNION ALL set operator must be specified.
* The first fullselect of the first union (the initialization
fullselect) must not include a reference to the
common-table-expression itself in any FROM clause.
"

Jeff> * UNION ALL only:

Jeff> with recursive
Jeff> foo(i) as (values(1) union select i+1 from foo where i < 10)
Jeff> select * from foo;
Jeff> ERROR: non-recursive term and recursive term must be combined with
Jeff> UNION ALL

Jeff> The standard seems to allow UNION ALL, UNION, INTERSECT, and
Jeff> EXCEPT (when the recursive term is not on the RHS of the
Jeff> EXCEPT).

Again, existing implementations disagree. See above for DB2, and for
MSSQL:

"Anchor members must be combined by one of these set operators: UNION
ALL, UNION, INTERSECT, or EXCEPT. UNION ALL is the only set operator
allowed between the last anchor member and first recursive member, and
when combining multiple recursive members."

Jeff> * Binary recursion and subselect strangeness:

Jeff> with recursive foo(i) as
Jeff> (values (1)
Jeff> union all
Jeff> select * from
Jeff> (select i+1 from foo where i < 10
Jeff> union all
Jeff> select i+1 from foo where i < X) t)
Jeff> select * from foo;

Jeff> Produces 10 rows of output regardless of what "X" is. This
Jeff> should be fixed for 8.4. Also, this is non-linear recursion,
Jeff> which the standard seems to disallow.

That looks like it should be disallowed somehow.

Jeff> * Multiple recursive references:
[snip]

Jeff> If we're going to allow non-linear recursion (which the
Jeff> standard does not), this seems like it should be a valid
Jeff> case.

We probably shouldn't allow it (as above).

[snip * Strange result with except: which looks like a bug]

Jeff> * Aggregates allowed: which

Jeff> with recursive foo(i) as
Jeff> (values(1)
Jeff> union all
Jeff> select max(i)+1 from foo where i < 10)
Jeff> select * from foo;

Jeff> Aggregates should be blocked according to the standard.
Jeff> Also, causes an infinite loop. This should be fixed for 8.4.

Does the standard require anywhere that non-conforming statements must
be diagnosed? (seems impractical, since it would forbid extensions)

Jeff> * DISTINCT should supress duplicates:

Jeff> with recursive foo(i) as
Jeff> (select distinct * from (values(1),(2)) t
Jeff> union all
Jeff> select distinct i+1 from foo where i < 10)
Jeff> select * from foo;

Jeff> This outputs a lot of duplicates, but they should be
Jeff> supressed according to the standard. This query is essentially
Jeff> the same as supporting UNION for recursive queries, so we
Jeff> should either fix both for 8.4 or block both for consistency.

Yeah, though the standard's use of DISTINCT in this way is something
of a violation of the POLA.

Jeff> * outer joins on a recursive reference should be blocked:

Jeff> with recursive foo(i) as
Jeff> (values(1)
Jeff> union all
Jeff> select i+1 from foo left join (values(1)) t on (i=column1))
Jeff> select * from foo;

Jeff> Causes an infinite loop, but the standard says using an outer
Jeff> join in this situation should be prohibited. This should be
Jeff> fixed for 8.4.

No. This has already been discussed; it's neither possible nor desirable
to diagnose all cases which can result in infinite loops, and there are
important types of queries which would be unnecessarily forbidden.

Besides, you've misread the spec here: it prohibits the recursive
reference ONLY on the nullable side of the join. You cite:

Jeff> outer joins not allowed to join recursive references:
Jeff> 7.13: Syntax Rules: 2.g.iii.6
Jeff> 7.13: Syntax Rules: 2.g.iii.7

6) WQEi shall not contain a <qualified join> QJ in which:

A) QJ immediately contains a <join type> that specifies FULL and a
<table reference> or <table factor> that contains a <query name>
referencing WQNj.

[no recursive FULL JOIN at all]

B) QJ immediately contains a <join type> that specifies LEFT and a
<table factor> following the <join type> that contains a <query
name> referencing WQNj.

[note "following the <join type>", so WQNj LEFT JOIN foo is allowed,
but foo LEFT JOIN WQNj is not]

C) QJ immediately contains a <join type> that specifies RIGHT and a
<table reference> preceding the <join type> that contains a
<query name> referencing WQNj.

[note "preceding the <join type>", so foo RIGHT JOIN WQNj is allowed,
but WQNj RIGHT JOIN foo is not]

para (7) is identical but for natural rather than qualified joins.

Jeff> * ORDER BY, LIMIT, and OFFSET are rejected for recursive
Jeff> queries. The standard does not seem to say that these should be
Jeff> rejected.

Note that supporting those in subqueries (including CTEs) is a separate
optional feature of the standard.

--
Andrew (irc:RhodiumToad)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Gierth 2008-09-08 17:22:27 Re: sql2008 diff sql2003
Previous Message Alvaro Herrera 2008-09-08 17:02:52 Re: sql2008 diff sql2003