Quick Links

Re: Removing unneeded self joins

From:	Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>
To:	David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2018-11-21 12:54:20
Message-ID:	23f6cf6d-32ca-faa2-f4ec-da690ba42482@postgrespro.ru
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

El 08/11/18 a las 08:59, David Rowley escribió:
> On 19 October 2018 at 01:47, Alexander Kuzmenkov
> <a(dot)kuzmenkov(at)postgrespro(dot)ru> wrote:
>> Here is a version that compiles.
> I had a quick read through this and I think its missing about a 1-page
> comment section detailing when we can and when we cannot remove these
> self joins, and what measures we must take when we do remove them.

I added some explanation to the comment for remove_useless_joins. This
is probably still not clear enough, so if you have any particular
questions I'll cover them too. While improving the comments, I found
some bugs around the handling of join clauses and broken ECs, so I fixed
them and added the tests.

> Apart from that, I noted the following during my read:
>
> 1. I don't think this is the right way to do this. There are other
> places where we alter the varnoold. For example:
> search_indexed_tlist_for_var(). So you should likely be doing that too
> rather than working around it.

Fixed.

> 2. Surely the following loop is incorrect:
>
> for (i = toKeep->min_attr; i <= toKeep->max_attr; i++)
> {
> int attno = i - toKeep->min_attr;
> toKeep->attr_needed[attno] = bms_add_members(toKeep->attr_needed[attno],
> toRemove->attr_needed[attno]);
> }
>
> What if toRemove has a lower min_attr or higher max_attr?

This shouldn't happen because this is always the same relation, and
max_attr is its physical number of attributes. There is an assertion
about this in remove_self_joins_one_group:
/* A sanity check: the relations have the same Oid. */
Assert(root->simple_rte_array[relids[i]]->relid ==
root->simple_rte_array[relids[o]]->relid);

> 3. "wind" -> "find"
>
> + * When we wind such a join, we mark one of the participating relation as

Fixed.

> 4. I think the following shouldn't be happening:
>
> +------------------------------------------------
> Result
> One-Time Filter: false
> -(2 rows)
> + -> Index Scan using parent_pkey on parent x
> + Index Cond: (k = 1)
> +(4 rows)

This happens because for join rels, we make some effort to prove that
they are empty and not make any paths for them, and we don't do this for
base rels. When we remove the join, this difference is exposed. Compare
to this query:

postgres=# explain select * from parent where k = 1 and k = 2;
                                   QUERY PLAN
────────────────────────────────────────────────────────────────────────────────
Result (cost=0.15..8.17 rows=1 width=8)
   One-Time Filter: false
   -> Index Scan using parent_pkey on parent (cost=0.15..8.17 rows=1
width=8)
         Index Cond: (k = 1)
(4 rows)

> 5. I'd have thought the opposite. Surely there are more chances of
> this being useful with more joins?
>
> + /* Limit the number of joins we process to control the quadratic behavior. */
> + if (n > join_collapse_limit)
> + break;

That is true, but we also have to think about the overhead when we don't
find any joins to remove. Without this cutoff, we have to examine every
pair of self-joins, so the run time grows quadratically with the number
of such joins in the query. I don't have a better idea on how to control
this.

> 6. In remove_self_joins_one_level() I think you should collect the
> removed relations in a Relids rather than a list.

Done.

--
Alexander Kuzmenkov
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company

Attachment	Content-Type	Size
remove-self-join-v7.patch	text/x-patch	56.7 KB

In response to

Re: Removing unneeded self joins at 2018-11-08 05:59:19 from David Rowley

Responses

Re: Removing unneeded self joins at 2018-12-24 16:28:01 from Alexander Kuzmenkov

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Alexander Kuzmenkov	2018-11-21 13:02:47	Re: [HACKERS] PoC: full merge join on comparison clause
Previous Message	Michael Banck	2018-11-21 12:35:35	Re: Online verification of checksums