Re: Removing unneeded self joins

From: Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Removing unneeded self joins
Date: 2019-03-22 14:39:40
Message-ID: 31c2ee33-5db1-a43c-b964-8e2624020ae9@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 3/21/19 01:54, David Rowley wrote:
> I really just don't think checking the unique indexes match is going
> to cut it. You should be looking for a unique index where the join
> clauses match on either side of the join, not looking independently
> and then checking the indexes are the same ones.

The bug you mention later is an implementation bug that can be fixed (I
will expand on that below). Besides this, do you think current self-join
detection algorithm has fundamental correctness problems? I am not aware
of such problems and this algorithm reflects my understanding of what
constitutes a removable self-join, so it would be helpful if you could
explain what exactly is wrong with it.

Your alternative idea sounds plausible, but I won't know it's correct
for sure until I implement it, which is a nontrivial amount of work. I
am also concerned that we will have to redo calculations similar to
innerrel_is_unique(), because having near-zero overhead is a hard
prerequisite for this patch, as was discussed upthread.

In short, I am reluctant to implement a new approach to detection until
I understand why the current one is fundamentally broken.

> Here's an example of what can go wrong with your current code:

This is a bug indeed. Unique index search is not exhaustive, so if many
indexes match the join quals, we might not find the same index for both
sides. I think this can be overcome if we switch to exhaustive search in
relation_has_unique_index_for, and then try to find matching indexes in
is_unique_self_join. Please see the new patch for the fix.

> I also think this way would give you the subquery GROUP BY / DISTINCT
> self join removal for just about free.

Could you explain how exactly we can generalize join removal to the
DISTINCT/GROUP BY case? I understand the removable self-joins as having
the same row on both sides of join, as identified by ctid, but I'm not
sure how to apply this to subqueries. Your earlier DISTINCT example
looked like it needed a different kind of join removal, with different
conditions for when it applies (please see my previous letter for details).

Another thing I'd like to mention is that this patch splits in two
independent parts, the detection of self-joins and their removal. While
we are looking for agreement on the detection part, could you also
review the removal part? I'm sure it has its own share of problems.

--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
v13-0001-Remove-unique-self-joins.patch text/x-patch 70.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-03-22 15:12:07 Re: Ordered Partitioned Table Scans
Previous Message David Rowley 2019-03-22 14:30:27 Re: Ordered Partitioned Table Scans