Re: Performance improvement for joins where outer side is unique

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Performance improvement for joins where outer side is unique
Date: 2016-04-06 20:01:21
Message-ID: CAKJS1f9OrAkVywOvk5OhCoXOssGUmtYOQyhU8jFr_bZL+Xn6zQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 7 April 2016 at 04:05, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> writes:
>> In the last patch I failed to notice that there's an alternative
>> expected results file for one of the regression tests.
>> The attached patch includes the fix to update that file to match the
>> new expected EXPLAIN output.
>
> Starting to look at this again. I wonder, now that you have the generic
> caching mechanism for remembering whether join inner sides have been
> proven unique, is it still worth having the is_unique_join field in
> SpecialJoinInfo? It seems like that's creating a separate code path
> for special joins vs. inner joins that may not be buying us much.
> It does potentially save lookups in the unique_rels cache, if you already
> have the SpecialJoinInfo at hand, but I'm not sure what that's worth.

I quite like that field where it is, as it should make
remove_useless_joins() a bit more efficient, as after a LEFT JOIN is
removed, the previous code would go off and try to make sure all the
joins are unique again, but now we cache that, and save it from having
to bother doing that again, on joins already marked as unique.

Certainly changing that would mean one less special case in
joinpath.c, as the JOIN_LEFT case can be handle the same as the other
cases, although it looks like probably, if I do change that, then I'd
probably move is_innerrel_unique_for() into analyzejoins.c, and put
the special case for JOIN_LEFT in that function, so that it calls
specialjoin_is_unique_join(), then cache the sjinfo->min_righthand in
the unique_rels cache if the result comes back positive, and in the
non_unique_rels cache if negative... But it seems a bit crazy to go to
the trouble or all that caching, when we can just throw the result in
a struct field in the case of Special Joins. Maybe we could just hide
both the new joinpath.c functions in analyzejoins.c and call it quits.
It's not as if there's no special cases for JOIN_LEFT in that file.

> Also, as I'm looking around at the planner some more, I'm beginning to get
> uncomfortable with the idea of using JOIN_SEMI this way. It's fine so far
> as the executor is concerned, no doubt, but there's a lot of planner
> expectations about the use of JOIN_SEMI that we'd be violating. One is
> that there should be a SpecialJoinInfo for any SEMI join. Another is that
> JOIN_SEMI can be implemented by unique-ifying the inner side and then
> doing a regular inner join; that's not a code path we wish to trigger in
> these situations. The patch might avoid tripping over these hazards as it
> stands, but it seems fragile, and third-party FDWs could easily contain
> code that'll be broken. So I'm starting to feel that we'd better invent
> two new JoinTypes after all, to make sure we can distinguish plain-join-
> with-inner-side-known-unique from a real SEMI join when we need to.
>
> What's your thoughts on these matters?

I was a bit uncomfortable with JOIN_INNER becoming JOIN_SEMI before,
but that was for EXPLAIN reasons. So I do think it's better to have
JOIN_INNER_UNIQUE and JOIN_LEFT_UNIQUE instead. I can go make that
change, but unsure on how EXPLAIN will display these now. I'd need to
pull out my tests if we don't have anything to show in EXPLAIN, and
I'd really rather have tests for this.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Feld, Michael (IMS) 2016-04-06 20:21:22 Re: pg_upgrade error regarding hstore operator
Previous Message Robbie Harwood 2016-04-06 19:22:15 Re: [PATCH v12] GSSAPI encryption support