Re: Removing unneeded self joins

From: Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: David Steele <david(at)pgmasters(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Removing unneeded self joins
Date: 2020-11-28 17:21:17
Message-ID: b32dda0d-41dc-381f-73f0-4702265c97aa@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thank you for the review,

On 27.11.2020 21:49, Heikki Linnakangas wrote:
> On 31/10/2020 11:26, Andrey V. Lepikhov wrote:
>> +            /*
>> +             * Process restrictlist to seperate out the self join
>> quals from
>> +             * the other quals. e.g x = x goes to selfjoinquals and a
>> = b to
>> +             * otherjoinquals.
>> +             */
>> +            split_selfjoin_quals(root, restrictlist, &selfjoinquals,
>> +                                 &otherjoinquals);
>> +
>> +            if (list_length(selfjoinquals) == 0)
>> +            {
>> +                /*
>> +                 * Have a chance to remove join if target list
>> contains vars from
>> +                 * the only one relation.
>> +                 */
>
> I don't understand the logic here. If 'selfjoinquals' is empty, it means
> that there is no join qual between the two relations, right? How can we
> ever remove the join in that case? And how does the target list affect
> that? Can you give an example query of that?

Maybe it is a problem of variable naming. Following the idea of David
Rowley, we split quals into two subsets: {x==x} and another, for example
{x=y}.
First set is an trivial case of self-join: if we have unique index on
the attribute 'x', then this join is self-join.
Second set is give us a chance: if right side is unique for right side
of the qual and no vars from right side end up in the target list of the
join, then this is a self-join case. Example:

CREATE TABLE a(x int, y int);
CREATE UNIQUE INDEX ON a(x);
SELECT a1.* FROM a a1, a a2 WHERE a1.x = a2.x; -- self-join
CREATE UNIQUE INDEX ON a(y);
SELECT a1.* FROM a a1, a a2 WHERE a1.x = a2.y; -- self-join too

>
>> --- a/src/test/regress/expected/join.out
>> +++ b/src/test/regress/expected/join.out
>> @@ -4553,11 +4553,13 @@ explain (costs off)
>>  select p.* from
>>    (parent p left join child c on (p.k = c.k)) join parent x on p.k = x.k
>>    where p.k = 1 and p.k = 2;
>> -        QUERY PLAN ---------------------------
>> +                   QUERY PLAN
>> +------------------------------------------------
>>   Result
>>     One-Time Filter: false
>> -(2 rows)
>> +   ->  Index Scan using parent_pkey on parent x
>> +         Index Cond: (k = 1)
>> +(4 rows)
>>
>>  -- bug 5255: this is not optimizable by join removal
>>  begin;
>
> That doesn't seem like an improvement...

I investigated this case. It is a planner feature: it simplifies dummy
joins and dummy scans to different plans. Maybe it can cause some
discussion, but this example so rare and doesn't make a problem.

> My general impression of this patch is that it's a lot of code to deal
> with a rare special case. At the beginning of this thread there was
> discussion on the overhead that this might add to planning queries that
> don't benefit, but adding a lot of code isn't nice either, even if the
> performance is acceptable. That's not necessarily a show-stopper, but it
> does make me much less excited about this. I'm not sure what to suggest
> to do about that, except a really vague "Can you make is simpler?"

Currently planner reduces useless outer joins and unique semijoins.
Reduce self join feature continues the development of the planner in the
same direction. For example, it is needed for ORM software.
Most of the code dedicated to removing unnecessary relation and
replacing of one oid with another. We are trying to use
remove_rel_from_query() machinery. Perhaps this will allow us to make
the code shorter.

--
regards,
Andrey Lepikhov
Postgres Professional

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2020-11-28 17:36:26 Re: Add Information during standby recovery conflicts
Previous Message Tom Lane 2020-11-28 16:48:17 Re: gcc -Wimplicit-fallthrough and pg_unreachable