Re: Allowing join removals for more join types

From: David Rowley <dgrowley(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Dilip kumar <dilip(dot)kumar(at)huawei(dot)com>
Subject: Re: Allowing join removals for more join types
Date: 2014-06-23 11:06:59
Message-ID: CAHoyFK9LBdr9Oa1F+urJDU8aLrWVX8zQQJxWw8zoDWvBMA0V4w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 23 June 2014 09:31, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:

> On 22 June 2014 12:51, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>
> > Looks good on initial look.
>
> Tests 2 and 3 seem to test the same thing.
>
>
Ok, I've removed test 2 and kept test 3 which is the DISTINCT a+b test.

> There are no tests which have multiple column clauselist/sortlists,
> nor tests for cases where the clauselist is a superset of the
> sortlist.
>
>
I've added:
SELECT a.id FROM a LEFT JOIN (SELECT b.id,b.c_id FROM b GROUP BY b.id,b.c_id)
b ON a.b_id = b.id AND a.id = b.c_id
but I'm half temped to just add 2 new tables that allow this to be done in
a more sensible way, since c_id is really intended to reference c.id in the
defined tables.

I've also added one where the join condition is a superset of the GROUP BY
clause. I had indented the one with the constant to be this, but probably,
you're right, it should be an actual column since constants are treated
differently.

> Test comments should refer to "join removal" rather than
> "optimization" because we may forget which optimization they are there
> to test.
>
>
Good idea...Fixed.

> It's not clear to me where you get the term "sortclause" from. This is
> either the groupclause or distinctclause, but in the test cases you
> provide this shows this has nothing at all to do with sorting since
> there is neither an order by or a sorted aggregate anywhere near those
> queries. Can we think of a better name that won't confuse us in the
> future?
>
>
I probably got the word "sort" from the function targetIsInSortList, which
expects a list of SortGroupClause. I've renamed the function to
sortlist_is_unique_on_restrictinfo() and renamed the sortclause parameter
to sortlist. Hopefully will reduce confusion about it being an ORDER BY
clause a bit more. I think sortgroupclauselist might be just a bit too
long. What do you think?

> The comment "Since a constant only has 1 value the existence of one here
> will
> + * not cause any duplication of the results. We'll simply ignore it!"
> would be better as "We can ignore constants since they have only one
> value and don't affect uniqueness of results".
>
>
Ok, changed.

> The comment "XXX is this comment still true??" can be removed since
> its just a discussion point.
>
>
Removed.

> The comment beginning "Currently, we only know how to remove left..."
> has rewritten a whole block of text just to add a few words in the
> middle. We should rewrite the comment so it changes as few lines as
> possible. Especially when that comment is going to be changed again
> with your later patches. Better to have it provide a bullet point list
> of things we know how to remove, so we can just add to it later.
>
>
I had thought that I'd put the code for other join types in their own
functions as not all will have a SpecialJoinInfo. In the patch that
contained ANTI and SEMI join support I had renamed the function
join_is_removable() to leftjoin_is_removable() and added a new function for
semi/anti joins.

I've done a re-factor of this comment, although it likely would still need
some small updates around the part where it talks about "left join" later
when I start working on support for other join types. The previous comment
was giving some clarification about returning early when there's no indexes
on the relation, I decided to move this out of that comment and just
include a more general note at the bottom but also add some more detail
about why we're fast pathing out when indexlist is empty.

> Still looks good, other than the above.
>
>
Great. Thanks for reviewing!

I've attached an updated patch and also a delta patch of the changes I've
made since the last version.

Regards

David Rowley

--
> Simon Riggs http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
>

Attachment Content-Type Size
subquery_leftjoin_removal_v1.2.patch application/octet-stream 18.6 KB
subquery_leftjoin_removal_v1.2_delta.patch application/octet-stream 17.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2014-06-23 11:08:14 Re: [Fwd: Re: proposal: new long psql parameter --on-error-stop]
Previous Message Abhijit Menon-Sen 2014-06-23 10:51:53 Re: pgaudit - an auditing extension for PostgreSQL