Re: Common Table Expressions (WITH RECURSIVE) patch

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Tatsuo Ishii <ishii(at)postgresql(dot)org>
Cc: pgsql(at)j-davis(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Common Table Expressions (WITH RECURSIVE) patch
Date: 2008-09-16 18:32:23
Message-ID: 13449.1221589943@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tatsuo Ishii <ishii(at)postgresql(dot)org> writes:
> Included is the latest patches against CVS HEAD.

I spent some time reading this patch. Here are a few comments in
no particular order:

RangeRecursive node lacks copyfuncs/equalfuncs support.

Query.recursive is missed in equalfuncs.c. But rather than fix that,
get rid of it entirely. AFAICS the only use is in qual_is_pushdown_safe,
and what is the value of that test? The callers know perfectly well
whether they are operating on a recursive RTE or not. You might as well
just delete all the useless qual-pushdown-attempt code from
set_recursion_pathlist, and not need to touch qual_is_pushdown_safe
at all.

Is physical_tlist optimization sensible for RecursiveScan? We seem
to use it for every other Scan node type.

I dislike putting state into ExecutorState; that makes it impossible
to have multiple recursion nodes in one plan tree. It would probably
be better for the Recursion and RecursiveScan nodes to talk to each
other directly (compare HashJoin/Hash); although since they are not
adjacent in the plan tree I admit I'm not sure how to do that.

es_disallow_tuplestore doesn't seem to need to be in ExecutorState
at all, it could as well be in RecursionState.

I don't really like the way that Append nodes are being abused here.
It would be better to allow nodeRecursion.c to duplicate a little code
from nodeAppend.c, and have the child plans be direct children of
the Recursion node. BTW, is it actually possible to have more than
two children? I didn't spend enough time analyzing the restrictions
in parse_cte to be sure. If there are always just two then you could
simplify the representation by treating it like a join node instead
of an append. (The RTE_RECURSIVE representation sure makes it look
like there can be only two...)

Mark/restore support seems useless ... note the comment on
ExecSupportsMarkRestore (which should be updated if this code
isn't removed).

RecursiveScan claims to support backwards fetch, but does not in fact
contain code to do so. (Given that it will always be underneath
Recursion, which can't do backwards fetch, I see little point in adding
such code; fix execAmi.c instead.)

ExecInitRecursion doesn't seem to be on the same page about whether
it supports backward scan as execAmi.c, either.

This comment in nodeRecursivescan.c seems bogus:
/*
* Do not initialize scan tuple type, result tuple type and
* projection info in ExecInitRecursivescan. These types are
* initialized after initializing Recursion node.
*/
because the code seems to be doing exactly what the comment says it
doesn't.

Numerous comments appear to have been copied-and-pasted and not modified
from the original. Somebody will have to go over all that text.

ruleutils.c fails completely for non-recursive WITH. It *must* regenerate
such a query with a WITH clause, not as a flattened subquery which is what
you seem to be doing. This isn't negotiable because the semantics are
different. This will mean at least some change in the parsetree
representation. Perhaps we could add a bool to subquery RTEs to mark them
as coming from a nonrecursive WITH? The tests added for RTE_RECURSIVE
seem a bit ugly too. If it thinks that can't happen it should Assert so,
not just fall through silently.

commentary for ParseState.p_ctenamespace is gratuitously unlike the
comment style for the other fields, and p_recursive_namespace isn't
documented at all.

ParseState.p_in_with_clause is unused, should be removed.

The WithClause struct definition is poorly commented. It should be
stated that it is used only pre-parse-analysis (assuming that continues
to be true after you get done fixing ruleutils.c...), and it doesn't
say what the elements of the subquery list are (specifically, what
node type). A lot of the other added structs and fields could use
better commenting too.

For that matter "subquery" is a poor name for WithClause's list of CTEs,
especially so since it's hard to search for. It should be a plural name
and I'd be inclined to use something like "ctes" not "subqueries".
The term "subquery" is too overloaded already, so any place you can
refer to a WITH-list member as a CTE you should do so.

WithClause node may need a location field, and almost certainly has to
be handled somehow in exprLocation().

The error reports in parse_cte.c *desperately* need error locations.

Why does transformWithClause do parse_sub_analyze twice?
I'm not sure that's even safe, and it's surely unnecessary.
Also, what happens if a subquery isn't a SelectStmt? Silently
doing nothing doesn't seem like a good plan there.

Why are we putting essentially the same information into both
p_recursive_namespace and p_ctenamespace? Is there really a need
for both lists? The code added to transformFromClauseItem
seems quite wrong since it searches both lists even if it found a
match in the first one. This whole area looks like it needs
refactoring.

Costing is all bogus, but we knew that...

Why does set_recursion_pathlist think that the subquery might have
useful pathkeys? We know it must always be a UNION ALL, no?

PlanState.has_recursivescan seems like a complete kluge. Can't it just be
removed? It looks to me like it is working around bugs that hopefully aren't
there anymore. There is certainly no reason why a recursive CTE should be
more in need of rescanning than any other kind of plan. If it is needed then
the current implementation is completely broken anyway, since it would only
detect a RecursiveScan node that is directly underneath an agg or hash node.

Please pay some attention to keeping things in logical, consistent orders.
For instance the withClause field was inserted into _copySelectStmt()
in a different place from where it was inserted in the actual struct
declaration, which is confusing.

parseTypeString() ought to check for null withClause.

expression_tree_walker/mutator support seems entirely broken for
RTE_RECURSIVE RTEs. Shouldn't it be recursing into the subquery?

Missed adding non_recursive_query to the "zap unneeded substructure" part
of set_plan_references (assuming it really is unneeded).

There seem to be quite a number of places where RTE_SUBQUERY RTEs
are handled but the patch fails to add RTE_RECURSIVE handling ...

It's a really bad idea to use RTE subquery field over again for
RTE_RECURSIVE, especially without any comment saying you did that.
I would suggest two pointers in the RTE_RECURSIVE field list instead.

Do we really have to make RECURSIVE a fully reserved keyword?
(Actually, the patch makes it worse than reserved, by failing to
add it to the reserved_keywords list.)

checkCteTargetList is completely broken: it will only notice illegal
sublinks that are at the very top level of a targetlist expression.
checkWhereClause is very far short of adequate as well. Need to recurse
here, or find some other way. Given that the subexpressions haven't been
analyzed yet, this seems a bit messy --- expression_tree_walker doesn't
know about pre-analysis node trees, so you can't use it. I'd suggest
replacing this whole set of routines with just one recursive routine that
doesn't make pre-assumptions about which node types can be found where.
Alternatively, is there any way of delaying the validity checks until
*after* parse analysis of the expressions, so that you could use
expression_tree_walker et al?

BTW, it seems like a lot of the logic there could be simplified by depending
on the enum ordering RECURSIVE_OTHER > RECURSIVE_SELF > NON_RECURSIVE.
There are a number of places that are taking the larger of two values
in baroque, hard-to-follow ways.

I wonder if checkCteSelectStmt is detecting nonlinearity correctly.
Since RECURSIVE_OTHER dominates RECURSIVE_SELF, couldn't it fail to
miss the problem in something like (self union (self union other)) ?
Maybe what you really need is a bitmask:
NON_RECURSIVE = 0,
RECURSIVE_SELF = 1,
RECURSIVE_OTHER = 2,
RECURSIVE_BOTH = 3 /* the OR of RECURSIVE_SELF and RECURSIVE_OTHER */
and then you can merge two values via OR instead of MAX.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2008-09-16 19:32:38 EXEC_BACKEND
Previous Message Simon Riggs 2008-09-16 17:41:02 Re: Subtransaction commits and Hot Standby