Re: "SELECT ... FROM DUAL" is not quite as silly as it appears

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: Mark Dilger <hornschnorter(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: "SELECT ... FROM DUAL" is not quite as silly as it appears
Date: 2019-01-04 19:48:57
Message-ID: 20472.1546631337@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> writes:
> I've just looked over the v4 patch. I agree that having an RTE for a
> from-less SELECT seems like a good idea.

Thanks for looking!

> While reading the patch, I noted the following:
> 1. It's more efficient to use bms_next_member as it does not need to
> re-skip 0 words on each iteration. (Likely bms_first_member() is not
> needed anywhere in the code base)

Sure. As the comment says, this isn't meant to be super efficient for
multiple removable RTEs, but we might as well use the other API.

> 2. The following comment seems to indicate that we can go ahead and
> make parallelise the result processing, but the code still defers to
> the checks below and may still end up not parallelising if say,
> there's a non-parallel safe function call in the SELECT's target list.

Adjusted the comment.

> 3. You may as well just ditch the variable and just do:
> Assert(rel->relid > 0);
> Assert(planner_rt_fetch(rel->relid, root)->rtekind == RTE_RESULT);
> instead of:
> RangeTblEntry *rte PG_USED_FOR_ASSERTS_ONLY;
> There are a few other cases doing just that in costsize.c

Meh. I'm not really on board with doing it that way, as it'll just
mean more to change if the code is ever modified to look at other
fields of the RTE. Still, I see your point that other places in
costsize.c are doing it without PG_USED_FOR_ASSERTS_ONLY, so changed.

> 4. I don't quite understand why this changed in join.out

> @@ -3596,7 +3588,7 @@ select t1.* from
>>>>>>>>>>>>>>>>>> Hash Right Join
> Output: i8.q2
> Hash Cond: ((NULL::integer) = i8b1.q2)
> - -> Hash Left Join
> + -> Hash Join

> Can you explain why that's valid?

Excellent question. The reason that plan changed is the logic I added
in find_nonnullable_rels_walker:

+ * If the PHV's syntactic scope is exactly one rel, it will be forced
+ * to be evaluated at that rel, and so it will behave like a Var of
+ * that rel: if the rel's entire output goes to null, so will the PHV.

In this case there's a PHV wrapped around b2.d2, and this change allows
reduce_outer_joins_pass2 to detect that the i8-to-b2 left join can
be simplified to an inner join because the qual just above it (on the
left join of b1 to i8/b2) is strict for b2; that is, the condition
(b2.d2 = b1.q2) cannot succeed for a null-extended i8/b2 row.

If you reverse out just that change you'll see why I added it: without it,
the plan for the earlier "test a corner case in which we shouldn't apply
the star-schema optimization" isn't optimized as much as I thought it
should be.

v5 attached; this responds to your comments plus Alexander's earlier
gripe about not getting a clean build with --disable-cassert.
No really substantive changes though.

regards, tom lane

Attachment Content-Type Size
get-rid-of-empty-jointrees-5.patch text/x-diff 96.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message John Naylor 2019-01-04 19:55:52 Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)
Previous Message Alvaro Herrera 2019-01-04 19:35:51 Re: Use atexit() in initdb and pg_basebackup