Re: Patch to support SEMI and ANTI join removal

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Jim Nasby <jim(at)nasby(dot)net>
Subject: Re: Patch to support SEMI and ANTI join removal
Date: 2014-11-15 23:19:29
Message-ID: CAApHDvpWtMw6L6kcvymp-r8xL7j__6nC_+OaCFd-rgj2rc_rzQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Nov 16, 2014 at 10:09 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:

> On 15 October 2014 11:03, David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
>
> > The explain analyze from the above query looks like:
> > test=# explain (analyze, costs off, timing off) select count(*) from t1
> > inner join t2 on t1.t2_id=t2.id;
> > QUERY PLAN
> > ------------------------------------------------------------------
> > Aggregate (actual rows=1 loops=1)
> > -> Nested Loop (actual rows=1000000 loops=1)
> > -> Seq Scan on t1 (actual rows=1000000 loops=1)
> > -> Index Only Scan using t2_pkey on t2 (never executed)
> > Index Cond: (id = t1.t2_id)
> > Heap Fetches: 0
> > Execution time: 124.990 ms
> > (7 rows)
> >
> > As you can see the scan on t2 never occurred.
>
> Very good, happy to see this happening (yay FKs!) and with
> PostgreSQL-style rigour.
>
>
:)

> I've reviewed the patch from cold and I have a few comments.
>
>
Thanks!

> The plan you end up with here works quite differently from left outer
> join removal, where the join is simply absent. That inconsistency
> causes most of the other problems I see.
>
> I propose that we keep track of whether there are any potentially
> skippable joins at the top of the plan. When we begin execution we do
> a single if test to see if there is run-time work to do. If we pass
> the run-time tests we then descend the tree and prune the plan to
> completely remove unnecessary nodes. We end with an EXPLAIN and
> EXPLAIN ANALYZE that looks like this
>
> > QUERY PLAN
> > ------------------------------------------------------------------
> > Aggregate (actual rows=1 loops=1)
> > -> Seq Scan on t1 (actual rows=1000000 loops=1)
>
> Doing that removes all the overheads and complexity; it also matches
> how join removal currently works.
>
>
This sounds much cleaner than what I have at the moment, although, you say
EXPLAIN would look like that... I don't think that's quite true as the
EXPLAIN still would have the un-pruned version, as the pruning would be
done as executor start-up. Would it cause problems to have the EXPLAIN have
a different looking plan than EXPLAIN ANALYZE?

I'll need to look into how the plan is stored in the case of PREPARE
statements, as no doubt I can't go vandalising any plans that are stored in
the PREPARE hashtable. I'd need to make a copy first, unless that's already
done for me. But I guess I'd only have to do that if some flag on
PlannerInfo hasSkippableNodes was true, so likely the overhead of such a
copy would be regained by skipping some joins.

> The alternative is accepting some pretty horrible additional code in
> most join types, plus a small regression on nested loop joins which I
> would have to call out as regrettably unacceptable. (Horrible in this
> sense that we don't want that code, not that David's code is poor).
>
>
Yeah it is quite horrid. I did try and keep it as simple and as
non-invasive as possible, but for nest loop it seemed there was just no
better way.

> The tests on the patch are pretty poor. If we should use EXPLAINs to
> show a join removal that works and a join removal that fails. With a
> few of the main permutations.
>
>
Agreed. To be honest I abandoned the tests due to a problem with EXPLAIN
ANALYZE outputting the variable timing information at the bottom. There's
no way to disable this! So that makes testing much harder.

I added myself to the list of complainers over here ->
http://www.postgresql.org/message-id/CAApHDvoVzBTzLJbD9VfaznWo6jooK1k6-7rFQ8zYM9H7ndCcSA@mail.gmail.com
but the proposed solution (diff tool which supports regex matching) is not
all that simple, and I've not gotten around to attempting to make one yet.

I've also attached a rebased patch, as the old one is no longer applying.

Regards

David Rowley

Attachment Content-Type Size
inner_join_removals_2014-11-16_3a40b4f.patch application/octet-stream 52.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2014-11-15 23:35:39 Re: Improve automatic analyze messages for inheritance trees
Previous Message Andrew Dunstan 2014-11-15 22:56:40 Re: controlling psql's use of the pager a bit more