Re: Performance improvement for joins where outer side is unique

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Performance improvement for joins where outer side is unique
Date: 2016-03-12 17:16:50
Message-ID: CA+TgmoYJc+hBsvCQiPpuHM6sCRc1NGq5FWO10yeJtSeoH4ji8w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 11, 2016 at 4:32 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> So I started re-reading this thread in preparation for looking at the
> patch, and this bit in your initial message jumped out at me:
>
>> In all of our join algorithms in the executor, if the join type is SEMI,
>> we skip to the next outer row once we find a matching inner row. This is
>> because we don't want to allow duplicate rows in the inner side to
>> duplicate outer rows in the result set. Obviously this is required per SQL
>> spec. I believe we can also skip to the next outer row in this case when
>> we've managed to prove that no other row can possibly exist that matches
>> the current outer row, due to a unique index or group by/distinct clause
>> (for subqueries).
>
> I wondered why, instead of inventing an extra semantics-modifying flag,
> we couldn't just change the jointype to *be* JOIN_SEMI when we've
> discovered that the inner side is unique.
>
> Now of course this only works if the join type was INNER to start with.
> If it was a LEFT join, you'd need an "outer semi join" jointype which
> we haven't got at the moment. But I wonder whether inventing that
> jointype wouldn't let us arrive at a less messy handling of things in
> the executor and EXPLAIN. I'm not very enamored of plastering this
> "match_first_tuple_only" flag on every join, in part because it doesn't
> appear to have sensible semantics for other jointypes such as JOIN_RIGHT.
> And I'd really be happier to see the information reflected by join type
> than a new line in EXPLAIN, also.

The new join pushdown code in postgres_fdw does not grok SEMI and ANTI
joins because there is no straightforward way of reducing those back
to SQL. They can originate in multiple ways and not all of those can
be represented easily. I think it would be nice to do something to
fix this. For example, if a LEFT join WHERE outer_column IS NULL
turns into an ANTI join, it would be nice if that were marked in some
way so that postgres_fdw could conveniently emit it in the original
form.

Maybe the people who have been working on that patch just haven't been
creative enough in thinking about how to solve this problem, but it
makes me greet the idea of more join types that don't map directly
back to SQL with somewhat mixed feelings.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2016-03-12 17:19:30 Re: [COMMITTERS] pgsql: Provide much better wait information in pg_stat_activity.
Previous Message Robert Haas 2016-03-12 16:56:24 Re: [COMMITTERS] pgsql: Only try to push down foreign joins if the user mapping OIDs mat