Re: [PATCH] Allow multiple recursive self-references

From: Denis Hirn <denis(dot)hirn(at)uni-tuebingen(dot)de>
To: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
Cc: vignesh C <vignesh21(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pantelis Theodosiou <ypercube(at)gmail(dot)com>, David Fetter <david(at)fetter(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Zhihong Yu <zyu(at)yugabyte(dot)com>
Subject: Re: [PATCH] Allow multiple recursive self-references
Date: 2021-09-21 11:35:13
Message-ID: AC56483A-D07D-4724-975B-FC590CBF3F3B@uni-tuebingen.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> I studied up on the theory and terminology being discussed here. I conclude that what the latest version of this patch is doing (allowing multiple recursive references, but only in a linear-recursion way) is sound and useful.

That's great to hear!

> I haven't looked closely at the implementation changes yet. I'm not very familiar with that part of the code, so it will take a bit longer. Perhaps you could explain what you are doing there, either in email or (even better) in the commit message.

I have extended the commit message. Some more discussion (regarding tree rotation etc.) can be found
in this mail, but also in the commit message.

> What I think this patch needs is a lot more test cases. I would like to see more variants of different nestings of the UNION branches, some mixtures of UNION ALL and UNION DISTINCT, joins of the recursive CTE with regular tables (like in the flights-and-trains examples)

You are right, the testing is a bit sparse at the moment. I've added some more
test cases with the new version of this patch. Also I've improved the comments.

> as well as various cases of what is not allowed, namely showing that it can carefully prohibit non-linear recursion.

The regression tests already feature tests against non-linear recursion.
Therefore I did not add more myself.

> Also, in one instance you modify an existing test case. I'm not sure why. Perhaps it would be better to leave the existing test case alone and write a new one.

I agree that it would be better not to modify the existing test case, but the
modification is unavoidable. Here are the original queries:

> -- non-linear recursion is not allowed
> WITH RECURSIVE foo(i) AS
> (values (1)
> UNION ALL
> (SELECT i+1 FROM foo WHERE i < 10
> UNION ALL
> SELECT i+1 FROM foo WHERE i < 5)
> ) SELECT * FROM foo;

> WITH RECURSIVE foo(i) AS
> (values (1)
> UNION ALL
> SELECT * FROM
> (SELECT i+1 FROM foo WHERE i < 10
> UNION ALL
> SELECT i+1 FROM foo WHERE i < 5) AS t
> ) SELECT * FROM foo;

These two test cases are supposed to trigger the non-linear recursion check,
and they do without this patch. However, with the modifications this patch
introduces, both queries are now valid input. That's because each recursive
reference is placed inside a separate recursive UNION branch. This means that
both queries are linear recursive, and not non-linear recursive as they should be.

To make these tests check for non-linear recursion again, at least one
UNION branch must contain more than one recursive reference. That's the
modification I did.

> You also introduce this concept of reshuffling the UNION branches to collect all the recursive terms under one subtree.

Yes, but the reshuffling is only applied in a very specific situation. Example:

> UNION ---> UNION
> / \ / \
> UNION C A UNION
> / \ / \
> A B B C

A, B, and C are arbitrary SelectStmt nodes and can themselfes be deeper nested
UNION nodes.

A is a non-recursive term in the WITH RECURSIVE query. B, and C both contain a
recursive self-reference. The planner expects the top UNION node to contain
the non-recursive term in the larg, and the recursive term in the rarg.
Therefore, the tree shape on the left is invalid and would result in an error
message at the parsing stage. However, by rotating the tree to the right, this
problem can be solved so that the valid tree shape on the right side is created.

All this does, really, is to re-parenthesize the expression:
(A UNION B) UNION C ---> A UNION (B UNION C)

> Also, currently a query like this works [...] but this doesn't:
>
> WITH RECURSIVE t(n) AS (
> SELECT n+1 FROM t WHERE n < 100
> UNION ALL
> VALUES (1)
> )
> SELECT sum(n) FROM t;
>
> With your patch, the second should also work, so let's show some tests for that as well.

With just the tree rotation, the second query can not be fixed. The order of two
nodes is never changed. And I think that this is a good thing. Consider the
following query:

> WITH RECURSIVE t(n) AS (
> VALUES (1)
> UNION ALL
> SELECT n+1 FROM t WHERE n < 100
> UNION ALL
> VALUES (2)
> ) SELECT * FROM t LIMIT 100;

If we'd just collect all non-recursive UNION branches, the semantics of the
second query would change. But changing the semantics of a query (or preventing
certain queries to be formulated at all) is not something I think this patch
should do. Therfore – I think – it's appropriate that the second query fails.

Best,
-- Denis

Attachment Content-Type Size
0009-Add-SQL-standard-multiple-linear-self-references-in-.patch application/octet-stream 19.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2021-09-21 11:42:20 Re: row filtering for logical replication
Previous Message Tomas Vondra 2021-09-21 11:28:35 Re: mem context is not reset between extended stats