Re: Performance improvement for joins where outer side is unique

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Performance improvement for joins where outer side is unique
Date: 2017-01-27 16:44:20
Message-ID: 1545.1485535460@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> writes:
>> hmm. I'm having trouble understanding why this is a problem for Unique
>> joins, but not for join removal?

> Ah, you know what, that's just mistaken. I was thinking that we
> short-circuited the join on the strength of the hash (or merge) quals
> only, but actually we check all the joinquals first. As long as the
> uniqueness proof uses only joinquals and not conditions that will end up
> as otherquals, it's fine.

Actually, after thinking about that some more, it seems to me that there
is a performance (not correctness) issue here: suppose that we have
something like

select ... from t1 left join t2 on t1.x = t2.x and t1.y < t2.y

If there's a unique index on t2.x, we'll be able to mark the join
inner-unique. However, short-circuiting would only occur after
finding a row that passes both joinquals. If the y condition is
true for only a few rows, this would pretty nearly disable the
optimization. Ideally we would short-circuit after testing the x
condition only, but there's no provision for that.

This might not be a huge problem for outer joins. My sense of typical
SQL style is that the joinquals (ON conditions) are likely to be
exactly what you need to prove inner uniqueness, while random other
conditions will be pushed-down from WHERE and hence will be otherquals.
But I'm afraid it is quite a big deal for inner joins, where we dump
all available conditions into the joinquals. We might need to rethink
that choice.

At least for merge and hash joins, it's tempting to think about a
short-circuit test being made after testing just the merge/hash quals.
But we'd have to prove uniqueness using only the merge/hash quals,
so the planning cost might be unacceptably high --- particularly for
merge joins which often don't use all available mergeable quals.
In the end I think we probably want to keep the short-circuit in the
same place where it is for SEMI/ANTI cases (which have to have it
exactly there for semantic correctness).

I'm afraid though that we may have to do something about the
irrelevant-joinquals issue in order for this to be of much real-world
use for inner joins.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2017-01-27 16:49:33 Re: pg_ls_dir & friends still have a hard-coded superuser check
Previous Message Stephen Frost 2017-01-27 16:34:54 Re: pg_ls_dir & friends still have a hard-coded superuser check