Re: Performance improvement for joins where outer side is unique

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Performance improvement for joins where outer side is unique
Date: 2016-04-07 01:07:32
Message-ID: CAKJS1f_BxfNe3ixfz4S-g2RTodvjk82edcv9O9WtBqc9eUbWSA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 7 April 2016 at 08:01, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> wrote:
> On 7 April 2016 at 04:05, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Starting to look at this again. I wonder, now that you have the generic
>> caching mechanism for remembering whether join inner sides have been
>> proven unique, is it still worth having the is_unique_join field in
>> SpecialJoinInfo? It seems like that's creating a separate code path
>> for special joins vs. inner joins that may not be buying us much.
>> It does potentially save lookups in the unique_rels cache, if you already
>> have the SpecialJoinInfo at hand, but I'm not sure what that's worth.
>
> I quite like that field where it is, as it should make
> remove_useless_joins() a bit more efficient, as after a LEFT JOIN is
> removed, the previous code would go off and try to make sure all the
> joins are unique again, but now we cache that, and save it from having
> to bother doing that again, on joins already marked as unique.
>
> Certainly changing that would mean one less special case in
> joinpath.c, as the JOIN_LEFT case can be handle the same as the other
> cases, although it looks like probably, if I do change that, then I'd
> probably move is_innerrel_unique_for() into analyzejoins.c, and put
> the special case for JOIN_LEFT in that function, so that it calls
> specialjoin_is_unique_join(), then cache the sjinfo->min_righthand in
> the unique_rels cache if the result comes back positive, and in the
> non_unique_rels cache if negative... But it seems a bit crazy to go to
> the trouble or all that caching, when we can just throw the result in
> a struct field in the case of Special Joins. Maybe we could just hide
> both the new joinpath.c functions in analyzejoins.c and call it quits.
> It's not as if there's no special cases for JOIN_LEFT in that file.
>
>> Also, as I'm looking around at the planner some more, I'm beginning to get
>> uncomfortable with the idea of using JOIN_SEMI this way. It's fine so far
>> as the executor is concerned, no doubt, but there's a lot of planner
>> expectations about the use of JOIN_SEMI that we'd be violating. One is
>> that there should be a SpecialJoinInfo for any SEMI join. Another is that
>> JOIN_SEMI can be implemented by unique-ifying the inner side and then
>> doing a regular inner join; that's not a code path we wish to trigger in
>> these situations. The patch might avoid tripping over these hazards as it
>> stands, but it seems fragile, and third-party FDWs could easily contain
>> code that'll be broken. So I'm starting to feel that we'd better invent
>> two new JoinTypes after all, to make sure we can distinguish plain-join-
>> with-inner-side-known-unique from a real SEMI join when we need to.
>>
>> What's your thoughts on these matters?
>
> I was a bit uncomfortable with JOIN_INNER becoming JOIN_SEMI before,
> but that was for EXPLAIN reasons. So I do think it's better to have
> JOIN_INNER_UNIQUE and JOIN_LEFT_UNIQUE instead. I can go make that
> change, but unsure on how EXPLAIN will display these now. I'd need to
> pull out my tests if we don't have anything to show in EXPLAIN, and
> I'd really rather have tests for this.

I've attached an updated patch which introduces JOIN_INNER_UNIQUE and
JOIN_LEFT_UNIQUE. So unique inner joins no longer borrow JOIN_SEMI.

I also made some changes around the is_unique_join flag in
SpecialJoinInfo. I've changed this to become optimal_jointype, which
is initially set to jointype and updated by what used to be called
mark_unique_joins(), now called optimize_outerjoin_types(). The LEFT
JOIN removal code now only bothers with special joins with
optimal_jointype as JOIN_LEFT_UNIQUE. This still saves any double work
analyzing the unique properties of LEFT JOINs. I've moved the two new
functions I put in joinpath.c into analyzejoins.c

In EXPLAIN, I named these new join types "Unique Inner" and "Unique
Left". Going by a comment in explain.c, there's some historic reason
that we display "Hash Join" rather than "Hash Inner Join", and "Nested
Loop", rather than "Nested Loop Join" or "Nested Loop Inner Join". I
wasn't quite sure if Unique Inner Joins should use a similar shorthand
form, so I didn't touch that code. We'll get "Nested Loop Unique Inner
Join" now instead of "Nested Loop". I wasn't able to think of some
equivalent shorthand form that made sense.

I know we're close to the feature freeze, so I just want to say that
I'll be AFK starting 5:30am Friday New Zealand time (13:30 on the 8th
New York time), until Sunday ~4pm. . I really hope that if this needs
any tweaking it will be minor. I'll check my email before I leave on
Friday in the hope that if there's anything, then I can quickly fix it
up before I go, but time will be short.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
unique_joins_2016-04-07.patch application/octet-stream 66.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2016-04-07 01:45:51 pgsql: Use GRANT system to manage access to sensitive functions
Previous Message Kyotaro HORIGUCHI 2016-04-07 00:51:34 Re: postgres_fdw : altering foreign table not invalidating prepare statement execution plan.