Re: Removing unneeded self joins

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: David Steele <david(at)pgmasters(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Removing unneeded self joins
Date: 2020-11-29 17:10:55
Message-ID: 4bc7e3f9-d955-c213-dbcd-c41d4f83bbe6@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 28/11/2020 19:21, Andrey Lepikhov wrote:
> On 27.11.2020 21:49, Heikki Linnakangas wrote:
>> On 31/10/2020 11:26, Andrey V. Lepikhov wrote:
>>> +            /*
>>> +             * Process restrictlist to seperate out the self join
>>> quals from
>>> +             * the other quals. e.g x = x goes to selfjoinquals and a
>>> = b to
>>> +             * otherjoinquals.
>>> +             */
>>> +            split_selfjoin_quals(root, restrictlist, &selfjoinquals,
>>> +                                 &otherjoinquals);
>>> +
>>> +            if (list_length(selfjoinquals) == 0)
>>> +            {
>>> +                /*
>>> +                 * Have a chance to remove join if target list
>>> contains vars from
>>> +                 * the only one relation.
>>> +                 */
>>
>> I don't understand the logic here. If 'selfjoinquals' is empty, it means
>> that there is no join qual between the two relations, right? How can we
>> ever remove the join in that case? And how does the target list affect
>> that? Can you give an example query of that?
>
> Maybe it is a problem of variable naming. Following the idea of David
> Rowley, we split quals into two subsets: {x==x} and another, for example
> {x=y}.
> First set is an trivial case of self-join: if we have unique index on
> the attribute 'x', then this join is self-join.
> Second set is give us a chance: if right side is unique for right side
> of the qual and no vars from right side end up in the target list of the
> join, then this is a self-join case. Example:
>
> CREATE TABLE a(x int, y int);
> CREATE UNIQUE INDEX ON a(x);
> SELECT a1.* FROM a a1, a a2 WHERE a1.x = a2.x; -- self-join
> CREATE UNIQUE INDEX ON a(y);
> SELECT a1.* FROM a a1, a a2 WHERE a1.x = a2.y; -- self-join too

The latter join is not "useless". The patch is returning incorrect
result for that query:

> postgres=# insert into a values (1, 2);
> INSERT 0 1
> postgres=# insert into a values (2, 1);
> INSERT 0 1
> postgres=# SELECT a1.* FROM a a1, a a2 WHERE a1.x = a2.y; -- WRONG RESULT
> x | y
> ---+---
> (0 rows)
>
> postgres=# set enable_self_join_removal=off;
> SET
> postgres=# SELECT a1.* FROM a a1, a a2 WHERE a1.x = a2.y; -- CORRECT RESULT
> x | y
> ---+---
> 1 | 2
> 2 | 1
> (2 rows)

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Paul A Jungwirth 2020-11-29 17:11:05 Re: range_agg
Previous Message Tom Lane 2020-11-29 17:10:21 Re: Printing LSN made easy