Re: [PATCH] Allow multiple recursive self-references

From: Denis Hirn <denis(dot)hirn(at)uni-tuebingen(dot)de>
To: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
Cc: vignesh C <vignesh21(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pantelis Theodosiou <ypercube(at)gmail(dot)com>, David Fetter <david(at)fetter(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Zhihong Yu <zyu(at)yugabyte(dot)com>
Subject: Re: [PATCH] Allow multiple recursive self-references
Date: 2022-01-11 11:33:17
Message-ID: 13C29E8C-553C-40C7-9F4A-D8F3A63E97D8@uni-tuebingen.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> I have been studying this a bit more. I don't understand your argument here.
> Why would this query have different semantics than, say
>
> WITH RECURSIVE t(n) AS (
> VALUES (1)
> UNION ALL
> VALUES (2)
> UNION ALL
> SELECT n+1 FROM t WHERE n < 100
> ) SELECT * FROM t LIMIT 100;
>
> The order of UNION branches shouldn't be semantically relevant.

WITH RECURSIVE (ab)uses the (typically associative and commutative) UNION [ALL] clause,
and fundamentally changes the semantics – associativity and commutativity no longer apply.
I think your confusion stems from this ambiguity.

Let me briefly reiterate WITH RECURSIVE. Basically, you always have a query like this:

> WITH RECURSIVE w(c1,...) AS (
> <non-recursive>
> UNION [ALL]
> <recursive>
> ) q;

There must be a non-recursive part that does not refer to w, and -- without
the patch -- exactly one recursive part that refers to w. The non-recursive part,
and the recursive part are combined using UNION [ALL].

However, let's assume a different, unambiguous syntax just to make my point:

> WITH RECURSIVE w(c1,...) AS (
> <non-recursive>
> RUNION [ALL]
> <recursive>
> ) q;

Everything remains the same except that the non-recursive part and the recursive
part are combined using RUNION [ALL] instead of UNION [ALL].

Now let me rephrase your examples using this syntax:

> WITH RECURSIVE t(n) AS (
> (VALUES (1) UNION ALL VALUES (2))
> RUNION ALL
> SELECT n+1 FROM t WHERE n < 100
> )
> SELECT sum(n) FROM t;

> WITH RECURSIVE t(n) AS (
> VALUES (1) RUNION ALL (VALUES (2)
> UNION ALL
> SELECT n+1 FROM t WHERE n < 100)
> )
> SELECT sum(n) FROM t;

I hope this shows that this is not the same. The first query has two non-recursive
cases and one recursive case. The second query has two recursive cases instead.

The rewrites of those examples:

> RUNION RUNION
> / \ good / \
> VALUES(1) UNION --> VALUES(1) UNION
> / \ / \
> SELECT n+1... VALUES(2) VALUES(2) SELECT n+1...

This rewrite would be valid. The patch would not do that, however.

> RUNION RUNION
> / \ bad / \
> VALUES(1) UNION --> UNION SELECT n+1...
> / \ / \
> SELECT n+1... VALUES(2) VALUES(1) VALUES(2)

This is the rewrite you would expect. But it is not valid, because it changes semantics.
Therefore the patch does not do that.

> I'm having difficulties understanding which subset of cases your patch wants to address.

With this patch an extended WITH RECURSIVE syntax is enabled:

> WITH RECURSIVE w(c1,...) AS (
> <non-recursive>
> UNION [ALL]
> <recursive 1>
> ...
> UNION [ALL]
> <recursive n>
> ) q;

But really, it is:

> WITH RECURSIVE w(c1,...) AS (
> <non-recursive 1>
> ...
> <non-recursive n>
> UNION [ALL]
> <recursive n+1>
> ...
> UNION [ALL]
> <recursive m>
> ) q;

We can have arbitrarily many non-recursive branches (that is possible without the patch),
as well as arbitrarily many recursive UNION [ALL] branches. Problem here: UNION [ALL]
associates to the left. This means that we end up with a left-deep parse tree, that looks
something like this:

> RUNION
> / \
> ... m
> /
> UNION
> / \
> n n+1
> /
> ...
> /
> UNION
> / \
> 1 2

That is not correct, because all non-recursive branches must be part of the left
RUNION branch, and all recursive branches must be part of the right one. Postgres
performs this check in parse_cte.c using the checkWellFormedRecursion() function.

Having said the above, let me once again define the rewrite case the patch implements:

> RUNION RUNION
> / \ rotate / \
> ... n+m ---> UNION UNION
> / / \ / \
> UNION ... n n+1 UNION
> / \ / / \
> n n+1 UNION ... m
> / / \
> ... 1 2
> /
> UNION
> / \
> 1 2

By using tree rotation, the patch transforms the parse tree on the left
into the one on the right. All non-recursive terms 1..n are found in the
left branch of RUNION, all recursive terms n+1..m in the right branch.
This tree now passes the checkWellFormedRecursion() check.
I hope this clarifies things.

> The order of UNION branches shouldn't be semantically relevant.

I agree. However, a strict distinction must be made between RUNION and UNION clauses.
These must not be interchanged.

> I wonder why in ExecWorkTableScan() and ExecReScanWorkTableScan() you call
> tuplestore_copy_read_pointer() instead of just tuplestore_select_read_pointer().

The WorkTableScan reads the working_table tuplestore of the parent RecursiveUnion
operator. But after each iteration of the RecursiveUnion operator, the following
operations are performed:

> 122 /* done with old working table ... */
> 123 tuplestore_end(node->working_table); -- (1)
> 124
> 125 /* intermediate table becomes working table */
> 126 node->working_table = node->intermediate_table; -- (2)
> 127
> 128 /* create new empty intermediate table */
> 129 node->intermediate_table = tuplestore_begin_heap(false, false,
> 130 work_mem); -- (3)

https://doxygen.postgresql.org/nodeRecursiveunion_8c_source.html#l00122

In step (1), the current working_table is released. Therefore, all read pointers
that we had additionally allocated are gone, too. The intermediate table becomes
the working table in step (2), and finally a new empty intermediate table is
created (3).

Because of step (1), we have to allocate new unique read pointers for each worktable
scan again. Just using tuplestore_select_read_pointer() would be incorrect.

> What is the special role of read pointer 0 that you are copying. Before your
> changes, it was just the implicit read pointer, but now that we have several,
> it would be good to explain their relationship.

To allocate a new read pointer, tuplestore_alloc_read_pointer() could also be used.
However, we would have to know about the eflags parameter – which the worktable
scan has no information about.

The important thing about read pointer 0 is that it always exists, and it is initialized correctly.
Therefore, it is save to copy read pointer 0 instead of creating a new one from scratch.

> Also, the comment you deleted says "Therefore, we don't need a private read pointer
> for the tuplestore, nor do we need to tell tuplestore_gettupleslot to copy."
> You addressed the first part with the read pointer handling, but what about the
> second part? The tuplestore_gettupleslot() call in WorkTableScanNext() still
> has copy=false. Is this an oversight, or did you determine that copying is
> still not necessary?

That's an oversight. Copying is still not necessary. Copying would only be required,
if additional writes to the tuplestore occur. But that can not happen here.

Best,
-- Denis

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Verite 2022-01-11 11:36:46 Re: ICU for global collation
Previous Message Alvaro Herrera 2022-01-11 11:23:41 Re: a misbehavior of partition row movement (?)