Re: Performance improvement for joins where outer side is unique

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Performance improvement for joins where outer side is unique
Date: 2017-01-30 21:32:50
Message-ID: CAKJS1f9Hc1RvkYOH=iyaWBb79ExBhTzL560=Ckuc8AoPwa-ZrA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 31 January 2017 at 04:56, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> writes:
>> I can make this change, but before I do I just want to point that I
>> don't think what you've said here is entirely accurate.
>
>> Let's assume unique joins are very common place, and join removals are
>> not so common. If a query has 5 left joins, and only one of which can
>> be removed, then the new code will most likely perform 5 unique join
>> checks, whereas the old code would perform 9, as those unique checks
>> are performed again once the 1 relation is removed for the remaining
>> 4.
>
> I'm not following. If the join removal code had reached the stage of
> making a uniqueness check, and that check had succeeded, the join would be
> gone and there would be no repeat check later. If it didn't reach the
> stage of making a uniqueness check, then again there's no duplication.

I had forgotten the unique check was performed last. In that case the
check for unused columns is duplicated needlessly each time. But let's
drop it, as putting the code back is not making things any worse.

> There will be some advantage in making a negative cache entry if join
> removal performs a uniqueness check that fails, but I don't really see
> why that's hard. It does not seem like removal of a relation could
> cause another rel to become unique that wasn't before, so keeping
> negative cache entries across join removals ought to be safe.

I don't think that's possible. The whole point that the current join
removal code retries to remove joins which it already tried to remove,
after a successful removal is exactly because it is possible for a
join to become provability unique on the removal of another join. If
you remove that retry code, a regression test fails. I believe this is
because there initially would have been more than one RHS rel, and the
bitmap singleton check would have failed. After all a unique index is
on a single relation, so proofs don't exist when >1 rel is on the RHS.

In any case, it's not possible to use the cache with join removals, as
we use the otherquals for unique tests in join removals, but we can't
for unique joins, as I'm adding those optimizations to the executor
which rely on the uniqueness being on the join condition alone, so we
can skip to the next outer tuple on matched join, but unmatched quals.
If I change how join removals work in that regard it will disallow
join removals where they were previously possible. So that's a no go
area.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2017-01-30 21:42:10 Re: Declarative partitioning - another take
Previous Message Tom Lane 2017-01-30 20:41:33 Re: Improvements in psql hooks for variables