Re: Performance improvement for joins where outer side is unique

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Performance improvement for joins where outer side is unique
Date: 2016-04-06 16:05:57
Message-ID: 21641.1459958757@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> writes:
> In the last patch I failed to notice that there's an alternative
> expected results file for one of the regression tests.
> The attached patch includes the fix to update that file to match the
> new expected EXPLAIN output.

Starting to look at this again. I wonder, now that you have the generic
caching mechanism for remembering whether join inner sides have been
proven unique, is it still worth having the is_unique_join field in
SpecialJoinInfo? It seems like that's creating a separate code path
for special joins vs. inner joins that may not be buying us much.
It does potentially save lookups in the unique_rels cache, if you already
have the SpecialJoinInfo at hand, but I'm not sure what that's worth.

Also, as I'm looking around at the planner some more, I'm beginning to get
uncomfortable with the idea of using JOIN_SEMI this way. It's fine so far
as the executor is concerned, no doubt, but there's a lot of planner
expectations about the use of JOIN_SEMI that we'd be violating. One is
that there should be a SpecialJoinInfo for any SEMI join. Another is that
JOIN_SEMI can be implemented by unique-ifying the inner side and then
doing a regular inner join; that's not a code path we wish to trigger in
these situations. The patch might avoid tripping over these hazards as it
stands, but it seems fragile, and third-party FDWs could easily contain
code that'll be broken. So I'm starting to feel that we'd better invent
two new JoinTypes after all, to make sure we can distinguish plain-join-
with-inner-side-known-unique from a real SEMI join when we need to.

What's your thoughts on these matters?

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2016-04-06 16:06:08 Re: Truncating/vacuuming relations on full tablespaces
Previous Message Andres Freund 2016-04-06 15:55:22 Re: Proposal: Generic WAL logical messages