Re: Removing unneeded self joins

From: Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Removing unneeded self joins
Date: 2019-03-25 13:45:02
Message-ID: 89c8a17d-01f8-a03d-7fb8-0a8a070be47c@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 3/25/19 07:07, David Rowley wrote:
> You had commented the test with:
>
> -- If index conditions are different for each side, we won't select the same
> -- row on both sides, so the join can't be removed.
>
> but I don't quite understand why we can't remove the join in this
> situation.

My rationale was that we're not exactly removing the join, but replacing
it with a scan. So it is not enough to just have a single row on each
side, it must be the same physical row. In this example, before the
transformation, t1.b is not equal to t2.b, but they become equal
afterwards. This looks somewhat wrong. On the other hand, if the
conditions are different, the resulting condition is going to evaluate
to constant false and we won't get any rows, so maybe it's OK.

This brings me again to the question of what are the conditions for join
removal. If the formulation with indexes is not general enough, what do
we use instead? I guess it could be something like this:

1. Given the (btree equality) join and restriction clauses, both sides
are unique on the same set of columns. That is, if we fix the values of
these columns, both sides have at most one matching row.

    a. For each of these columns, we have either

        i) a join clause that equates some expression referencing the
outer column to the same expression referencing the same inner column.

        ii) a clause for each relation that equates the same expression
referencing the outer and inner column to some other arbitrary
expression, possibly a different one for each side. This expression may
be a Const or some expression that references a Var of some third relation.

2. All the resulting columns can be calculated using either side of the
join. For now, just require that both sides are base relations that
refer to the same physical relation.

Two points are not clear to me here:

1. We don't handle join clauses to third relations, but can they be
treated the same way we treat Consts?

2. Can we simplify the join when we don't have any join clauses and only
have Consts? Or should we have at least one join clause that equates the
same inner and outer column? Why is one join clause enough?

--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-03-25 13:53:05 Re: Feature: Add Greek language fulltext search
Previous Message Robert Haas 2019-03-25 13:28:57 Re: Protect syscache from bloating with negative cache entries