Re: "SELECT ... FROM DUAL" is not quite as silly as it appears

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: "SELECT ... FROM DUAL" is not quite as silly as it appears
Date: 2018-10-14 22:04:12
Message-ID: 32440.1539554652@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Thu, Mar 15, 2018 at 11:27 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> We've long made fun of Oracle(TM) for the fact that if you just want
>> to evaluate some expressions, you have to write "select ... from dual"
>> rather than just "select ...". But I've realized recently that there's
>> a bit of method in that madness after all. Specifically, having to
>> cope with FromExprs that contain no base relation is fairly problematic
>> in the planner. prepjointree.c is basically unable to cope with
>> flattening a subquery that looks that way, although we've inserted a
>> lot of overly-baroque logic to handle some subsets of the case (cf
>> is_simple_subquery(), around line 1500). If memory serves, there are
>> other places that are complicated by the case.
>>
>> Suppose that, either in the rewriter or early in the planner, we were
>> to replace such cases with nonempty FromExprs, by adding a dummy RTE
>> representing a table with no columns and one row. This would in turn
>> give rise to an ordinary Path that converts to a Result plan, so that
>> the case is handled without any special contortions later. Then there
>> is no case where we don't have a nonempty relids set identifying a
>> subquery, so that all that special-case hackery in prepjointree.c
>> goes away, and we can simplify whatever else is having a hard time
>> with it.

> Good idea.

Here's a proposed patch along those lines. Some notes for review:

* The new RTE kind is "RTE_RESULT" after the kind of Plan node it will
produce. I'm not entirely in love with that name, but couldn't think
of a better idea.

* I renamed the existing ResultPath node type to GroupResultPath to
clarify that it's not used to scan RTE_RESULT RTEs, but just for
degenerate grouping cases. RTE_RESULT RTEs (usually) give rise to plain
Path nodes with pathtype T_Result. It doesn't work very well to try to
unify those two cases, even though they give rise to identical Plans,
because there are different rules about where the associated quals live
during query_planner.

* The interesting changes are in prepjointree.c; almost all the rest
of the patch is boilerplate to support RTE_RESULT RTEs in mostly the
same way that other special RTE-scan plan types are handled in the
planner. In prepjointree.c, I ended up getting rid of the original
decision to try to delete removable RTEs during pull_up_subqueries,
and instead made it happen in a separate traversal of the join tree.
That's a lot less complicated, and it has better results because we
can optimize more cases once we've seen the results of expression
preprocessing and outer-join strength reduction.

* I tried to get rid of the empty-jointree special case in query_planner
altogether. While the patch works fine that way, it makes for a
measurable slowdown in planning trivial queries --- I saw close to 10%
degradation in TPS rate for a pgbench test case that was just
$ cat trivialselect.sql
select 2+2;
$ pgbench -n -T 10 -f trivialselect.sql
So I ended up putting back the special case, but it's much less of a
cheat than it was before; the RelOptInfo it hands back is basically the
same as the normal path would produce. For me, the patch as given is
within the noise level of being the same speed as HEAD on this case.

* There's a hack in nodeResult.c to prevent the executor from crashing
on a whole-row Var for an RTE_RESULT RTE, which is something that the
planner will create in SELECT FOR UPDATE cases, because it thinks it
needs to provide a ROW_MARK_COPY image of the RTE's output. We might
be able to get rid of that if we could teach the planner that it need
not bother rowmarking RESULT RTEs, but that seemed like it would be
really messy. (At the point where the decision is made, we don't know
whether a subquery might end up as just a RESULT, or indeed vanish
entirely.) Since I couldn't measure any reproducible penalty from
having the extra setup cost for a Result plan, I left it like this.

* There are several existing test cases in join.sql whose plans change
for the better with this patch, so I didn't really think we needed any
additional test cases to show that it's working.

* There are a couple of cosmetic changes in EXPLAIN output as a result
of ruleutils.c seeing more RTEs in the plan's rtable than it did before,
causing it to decide to table-qualify Var names in more cases. We could
maybe spend more effort on ruleutils.c's heuristic for when to qualify,
but that seems like a separable problem, and anyway it's only cosmetic.

I'll add this to the next CF.

regards, tom lane

Attachment Content-Type Size
get-rid-of-empty-jointrees-1.patch text/x-diff 92.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2018-10-14 22:09:23 Re: [RFC] Removing "magic" oids
Previous Message Thomas Munro 2018-10-14 21:38:04 Re: B-tree cache prefetches