Re: Removing unneeded self joins

From: Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Removing unneeded self joins
Date: 2018-11-21 12:54:20
Views: Raw Message | Whole Thread | Download mbox
Lists: pgsql-hackers

El 08/11/18 a las 08:59, David Rowley escribió:
> On 19 October 2018 at 01:47, Alexander Kuzmenkov
> <a(dot)kuzmenkov(at)postgrespro(dot)ru> wrote:
>> Here is a version that compiles.
> I had a quick read through this and I think its missing about a 1-page
> comment section detailing when we can and when we cannot remove these
> self joins, and what measures we must take when we do remove them.

I added some explanation to the comment for remove_useless_joins. This
is probably still not clear enough, so if you have any particular
questions I'll cover them too. While improving the comments, I found
some bugs around the handling of join clauses and broken ECs, so I fixed
them and added the tests.

> Apart from that, I noted the following during my read:
> 1. I don't think this is the right way to do this. There are other
> places where we alter the varnoold. For example:
> search_indexed_tlist_for_var(). So you should likely be doing that too
> rather than working around it.


> 2. Surely the following loop is incorrect:
> for (i = toKeep->min_attr; i <= toKeep->max_attr; i++)
> {
> int attno = i - toKeep->min_attr;
> toKeep->attr_needed[attno] = bms_add_members(toKeep->attr_needed[attno],
> toRemove->attr_needed[attno]);
> }
> What if toRemove has a lower min_attr or higher max_attr?

This shouldn't happen because this is always the same relation, and
max_attr is its physical number of attributes. There is an assertion
about this in remove_self_joins_one_group:
            /* A sanity check: the relations have the same Oid. */
            Assert(root->simple_rte_array[relids[i]]->relid ==

> 3. "wind" -> "find"
> + * When we wind such a join, we mark one of the participating relation as


> 4. I think the following shouldn't be happening:
> +------------------------------------------------
> Result
> One-Time Filter: false
> -(2 rows)
> + -> Index Scan using parent_pkey on parent x
> + Index Cond: (k = 1)
> +(4 rows)

This happens because for join rels, we make some effort to prove that
they are empty and not make any paths for them, and we don't do this for
base rels. When we remove the join, this difference is exposed. Compare
to this query:

postgres=# explain select * from parent where k = 1 and k = 2;
                                   QUERY PLAN
 Result  (cost=0.15..8.17 rows=1 width=8)
   One-Time Filter: false
   ->  Index Scan using parent_pkey on parent  (cost=0.15..8.17 rows=1
         Index Cond: (k = 1)
(4 rows)

> 5. I'd have thought the opposite. Surely there are more chances of
> this being useful with more joins?
> + /* Limit the number of joins we process to control the quadratic behavior. */
> + if (n > join_collapse_limit)
> + break;

That is true, but we also have to think about the overhead when we don't
find any joins to remove. Without this cutoff, we have to examine every
pair of self-joins, so the run time grows quadratically with the number
of such joins in the query. I don't have a better idea on how to control

> 6. In remove_self_joins_one_level() I think you should collect the
> removed relations in a Relids rather than a list.


Alexander Kuzmenkov
Postgres Professional:
The Russian Postgres Company

Attachment Content-Type Size
remove-self-join-v7.patch text/x-patch 56.7 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Kuzmenkov 2018-11-21 13:02:47 Re: [HACKERS] PoC: full merge join on comparison clause
Previous Message Michael Banck 2018-11-21 12:35:35 Re: Online verification of checksums