Re: Common Table Expressions (WITH RECURSIVE) patch

From: Tatsuo Ishii <ishii(at)postgresql(dot)org>
To: tgl(at)sss(dot)pgh(dot)pa(dot)us
Cc: pgsql(at)j-davis(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Common Table Expressions (WITH RECURSIVE) patch
Date: 2008-09-18 03:55:25
Message-ID: 20080918.125525.133409720.t-ishii@sraoss.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom, thanks for the review.

Here is an in-progress report. Patches against CVS HEAD attached.
(uncommented items are work-in-progress).
--
Tatsuo Ishii
SRA OSS, Inc. Japan

> Tatsuo Ishii <ishii(at)postgresql(dot)org> writes:
> > Included is the latest patches against CVS HEAD.
>
> I spent some time reading this patch. Here are a few comments in
> no particular order:
>
> RangeRecursive node lacks copyfuncs/equalfuncs support.

Functions added.

> Query.recursive is missed in equalfuncs.c. But rather than fix that,
> get rid of it entirely. AFAICS the only use is in qual_is_pushdown_safe,
> and what is the value of that test? The callers know perfectly well
> whether they are operating on a recursive RTE or not. You might as well
> just delete all the useless qual-pushdown-attempt code from
> set_recursion_pathlist, and not need to touch qual_is_pushdown_safe
> at all.

Query.recursive removed and qual_is_pushdown_safe is untouched.

> Is physical_tlist optimization sensible for RecursiveScan? We seem
> to use it for every other Scan node type.

Fixed and physical_tlist optimization is enabled for RecursiveScan I
believe.

> I dislike putting state into ExecutorState; that makes it impossible
> to have multiple recursion nodes in one plan tree. It would probably
> be better for the Recursion and RecursiveScan nodes to talk to each
> other directly (compare HashJoin/Hash); although since they are not
> adjacent in the plan tree I admit I'm not sure how to do that.
>
> es_disallow_tuplestore doesn't seem to need to be in ExecutorState
> at all, it could as well be in RecursionState.

It was for a workaround to avoid an infinit recursion in some
cases. Discussion came to the conclusion that it's user's
responsibilty to avoid such that case (otherwise the semantics of our
recursive query becomes to be different from the one defined in the
standard) I believe.

es_disallow_tuplestore removed.

> I don't really like the way that Append nodes are being abused here.
> It would be better to allow nodeRecursion.c to duplicate a little code
> from nodeAppend.c, and have the child plans be direct children of
> the Recursion node. BTW, is it actually possible to have more than
> two children? I didn't spend enough time analyzing the restrictions
> in parse_cte to be sure. If there are always just two then you could
> simplify the representation by treating it like a join node instead
> of an append. (The RTE_RECURSIVE representation sure makes it look
> like there can be only two...)
>
> Mark/restore support seems useless ... note the comment on
> ExecSupportsMarkRestore (which should be updated if this code
> isn't removed).
>
> RecursiveScan claims to support backwards fetch, but does not in fact
> contain code to do so. (Given that it will always be underneath
> Recursion, which can't do backwards fetch, I see little point in adding
> such code; fix execAmi.c instead.)
>
> ExecInitRecursion doesn't seem to be on the same page about whether
> it supports backward scan as execAmi.c, either.
>
> This comment in nodeRecursivescan.c seems bogus:
> /*
> * Do not initialize scan tuple type, result tuple type and
> * projection info in ExecInitRecursivescan. These types are
> * initialized after initializing Recursion node.
> */
> because the code seems to be doing exactly what the comment says it
> doesn't.
>
> Numerous comments appear to have been copied-and-pasted and not modified
> from the original. Somebody will have to go over all that text.
>
> ruleutils.c fails completely for non-recursive WITH. It *must* regenerate
> such a query with a WITH clause, not as a flattened subquery which is what
> you seem to be doing. This isn't negotiable because the semantics are
> different. This will mean at least some change in the parsetree
> representation. Perhaps we could add a bool to subquery RTEs to mark them
> as coming from a nonrecursive WITH? The tests added for RTE_RECURSIVE
> seem a bit ugly too. If it thinks that can't happen it should Assert so,
> not just fall through silently.
>
> commentary for ParseState.p_ctenamespace is gratuitously unlike the
> comment style for the other fields, and p_recursive_namespace isn't
> documented at all.
>
> ParseState.p_in_with_clause is unused, should be removed.

Done.

> The WithClause struct definition is poorly commented. It should be
> stated that it is used only pre-parse-analysis (assuming that continues
> to be true after you get done fixing ruleutils.c...), and it doesn't
> say what the elements of the subquery list are (specifically, what
> node type). A lot of the other added structs and fields could use
> better commenting too.
>
> For that matter "subquery" is a poor name for WithClause's list of CTEs,
> especially so since it's hard to search for. It should be a plural name
> and I'd be inclined to use something like "ctes" not "subqueries".
> The term "subquery" is too overloaded already, so any place you can
> refer to a WITH-list member as a CTE you should do so.
>
> WithClause node may need a location field, and almost certainly has to
> be handled somehow in exprLocation().
>
> The error reports in parse_cte.c *desperately* need error locations.
>
> Why does transformWithClause do parse_sub_analyze twice?
> I'm not sure that's even safe, and it's surely unnecessary.
> Also, what happens if a subquery isn't a SelectStmt? Silently
> doing nothing doesn't seem like a good plan there.
>
> Why are we putting essentially the same information into both
> p_recursive_namespace and p_ctenamespace? Is there really a need
> for both lists? The code added to transformFromClauseItem
> seems quite wrong since it searches both lists even if it found a
> match in the first one. This whole area looks like it needs
> refactoring.
>
> Costing is all bogus, but we knew that...
>
> Why does set_recursion_pathlist think that the subquery might have
> useful pathkeys? We know it must always be a UNION ALL, no?
>
> PlanState.has_recursivescan seems like a complete kluge. Can't it just be
> removed? It looks to me like it is working around bugs that hopefully aren't
> there anymore. There is certainly no reason why a recursive CTE should be
> more in need of rescanning than any other kind of plan. If it is needed then
> the current implementation is completely broken anyway, since it would only
> detect a RecursiveScan node that is directly underneath an agg or hash node.
>
> Please pay some attention to keeping things in logical, consistent orders.
> For instance the withClause field was inserted into _copySelectStmt()
> in a different place from where it was inserted in the actual struct
> declaration, which is confusing.
>
> parseTypeString() ought to check for null withClause.

Done.

> expression_tree_walker/mutator support seems entirely broken for
> RTE_RECURSIVE RTEs. Shouldn't it be recursing into the subquery?
>
> Missed adding non_recursive_query to the "zap unneeded substructure" part
> of set_plan_references (assuming it really is unneeded).
>
> There seem to be quite a number of places where RTE_SUBQUERY RTEs
> are handled but the patch fails to add RTE_RECURSIVE handling ...
>
> It's a really bad idea to use RTE subquery field over again for
> RTE_RECURSIVE, especially without any comment saying you did that.
> I would suggest two pointers in the RTE_RECURSIVE field list instead.
>
> Do we really have to make RECURSIVE a fully reserved keyword?
> (Actually, the patch makes it worse than reserved, by failing to
> add it to the reserved_keywords list.)

Changed to unreserved keyword.

> checkCteTargetList is completely broken: it will only notice illegal
> sublinks that are at the very top level of a targetlist expression.
> checkWhereClause is very far short of adequate as well. Need to recurse
> here, or find some other way. Given that the subexpressions haven't been
> analyzed yet, this seems a bit messy --- expression_tree_walker doesn't
> know about pre-analysis node trees, so you can't use it. I'd suggest
> replacing this whole set of routines with just one recursive routine that
> doesn't make pre-assumptions about which node types can be found where.
> Alternatively, is there any way of delaying the validity checks until
> *after* parse analysis of the expressions, so that you could use
> expression_tree_walker et al?
>
> BTW, it seems like a lot of the logic there could be simplified by depending
> on the enum ordering RECURSIVE_OTHER > RECURSIVE_SELF > NON_RECURSIVE.
> There are a number of places that are taking the larger of two values
> in baroque, hard-to-follow ways.
>
> I wonder if checkCteSelectStmt is detecting nonlinearity correctly.
> Since RECURSIVE_OTHER dominates RECURSIVE_SELF, couldn't it fail to
> miss the problem in something like (self union (self union other)) ?
> Maybe what you really need is a bitmask:
> NON_RECURSIVE = 0,
> RECURSIVE_SELF = 1,
> RECURSIVE_OTHER = 2,
> RECURSIVE_BOTH = 3 /* the OR of RECURSIVE_SELF and RECURSIVE_OTHER */
> and then you can merge two values via OR instead of MAX.
>
> regards, tom lane

Attachment Content-Type Size
recursive_query.patch.gz application/octet-stream 30.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2008-09-18 04:42:42 Re: optimizing CleanupTempFiles
Previous Message Zhe He 2008-09-18 02:39:55 Where is Aggregation Attribute