Re: Performance improvement for joins where outer side is unique

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Performance improvement for joins where outer side is unique
Date: 2016-03-13 12:06:33
Message-ID: CAKJS1f8S6FH_eH71SXmo_m2ChSU8LL75pf5cpN4TH=ZQQ0dENg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12 March 2016 at 11:43, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I wrote:
>> I wondered why, instead of inventing an extra semantics-modifying flag,
>> we couldn't just change the jointype to *be* JOIN_SEMI when we've
>> discovered that the inner side is unique.
>
> BTW, to clarify: I'm not imagining that we'd make this change in the
> query jointree, as for example prepjointree.c might do. That would appear
> to add join order constraints, which we don't want. But I'm thinking that
> at the instant where we form a join Path, we could change the Path's
> jointype to be JOIN_SEMI or JOIN_SEMI_OUTER if we're able to prove the
> inner side unique, rather than annotating the Path with a separate flag.
> Then that representation is what propagates forward.

I've attached a patch which implements this, although I did call the
new join type JOIN_LEFT_UNIQUE rather than JOIN_SEMI_OUTER.
I'm not all that confident that I've not added handling for
JOIN_LEFT_UNIQUE in places where it's not possible to get that join
type, I did leave out a good number of places where I know it's not
possible. I'm also not sure with too much certainty that I've got all
cases correct, but the regression tests pass. The patch is more
intended for assisting discussion than as a ready to commit patch.

> It seems like the major intellectual complexity here is to figure out
> how to detect inner-side-unique at reasonable cost. I see that for
> LEFT joins you're caching that in the SpecialJoinInfos, which is probably
> fine. But for INNER joins it looks like you're just doing it over again
> for every candidate join, and that seems mighty expensive.

I have a rough idea for this, but I need to think of it a bit more to
make sure it's bulletproof.
I also just noticed (since it's been a while since I wrote the patch)
that in add_paths_to_joinrel() that innerrel can naturally be a
joinrel too, and we can fail to find uniqueness in that joinrel. I
think it should be possible to analyse the join rel too and search for
a base rel which supports the distinctness, and then ensure none of
the other rels which make up the join rel can cause tuple duplication
of that rel. But this just causes missed optimisation opportunities.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
unique_joins_82d2a07_2016-03-14.patch application/octet-stream 46.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2016-03-13 12:12:16 Re: Refectoring of receivelog.c
Previous Message Jeff Janes 2016-03-13 07:30:07 Re: multivariate statistics v14