| From: | Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru> | 
|---|---|
| To: | David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> | 
| Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> | 
| Subject: | Re: Removing unneeded self joins | 
| Date: | 2018-11-21 12:54:20 | 
| Message-ID: | 23f6cf6d-32ca-faa2-f4ec-da690ba42482@postgrespro.ru | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
El 08/11/18 a las 08:59, David Rowley escribió:
> On 19 October 2018 at 01:47, Alexander Kuzmenkov
> <a(dot)kuzmenkov(at)postgrespro(dot)ru>  wrote:
>> Here is a version that compiles.
> I had a quick read through this and I think its missing about a 1-page
> comment section detailing when we can and when we cannot remove these
> self joins, and what measures we must take when we do remove them.
I added some explanation to the comment for remove_useless_joins. This 
is probably still not clear enough, so if you have any particular 
questions I'll cover them too. While improving the comments, I found 
some bugs around the handling of join clauses and broken ECs, so I fixed 
them and added the tests.
> Apart from that, I noted the following during my read:
>
> 1. I don't think this is the right way to do this. There are other
> places where we alter the varnoold. For example:
> search_indexed_tlist_for_var(). So you should likely be doing that too
> rather than working around it.
Fixed.
> 2. Surely the following loop is incorrect:
>
> for (i = toKeep->min_attr; i <= toKeep->max_attr; i++)
> {
> int attno = i - toKeep->min_attr;
> toKeep->attr_needed[attno] = bms_add_members(toKeep->attr_needed[attno],
> toRemove->attr_needed[attno]);
> }
>
> What if toRemove has a lower min_attr or higher max_attr?
This shouldn't happen because this is always the same relation, and 
max_attr is its physical number of attributes. There is an assertion 
about this in remove_self_joins_one_group:
             /* A sanity check: the relations have the same Oid. */
             Assert(root->simple_rte_array[relids[i]]->relid == 
root->simple_rte_array[relids[o]]->relid);
> 3. "wind" -> "find"
>
> + * When we wind such a join, we mark one of the participating relation as
Fixed.
> 4. I think the following shouldn't be happening:
>
> +------------------------------------------------
>    Result
>      One-Time Filter: false
> -(2 rows)
> +   ->  Index Scan using parent_pkey on parent x
> +         Index Cond: (k = 1)
> +(4 rows)
This happens because for join rels, we make some effort to prove that 
they are empty and not make any paths for them, and we don't do this for 
base rels. When we remove the join, this difference is exposed. Compare 
to this query:
postgres=# explain select * from parent where k = 1 and k = 2;
                                    QUERY PLAN
────────────────────────────────────────────────────────────────────────────────
  Result  (cost=0.15..8.17 rows=1 width=8)
    One-Time Filter: false
    ->  Index Scan using parent_pkey on parent  (cost=0.15..8.17 rows=1 
width=8)
          Index Cond: (k = 1)
(4 rows)
> 5. I'd have thought the opposite. Surely there are more chances of
> this being useful with more joins?
>
> + /* Limit the number of joins we process to control the quadratic behavior. */
> + if (n > join_collapse_limit)
> + break;
That is true, but we also have to think about the overhead when we don't 
find any joins to remove. Without this cutoff, we have to examine every 
pair of self-joins, so the run time grows quadratically with the number 
of such joins in the query. I don't have a better idea on how to control 
this.
> 6. In remove_self_joins_one_level() I think you should collect the
> removed relations in a Relids rather than a list.
Done.
-- 
Alexander Kuzmenkov
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company
| Attachment | Content-Type | Size | 
|---|---|---|
| remove-self-join-v7.patch | text/x-patch | 56.7 KB | 
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Alexander Kuzmenkov | 2018-11-21 13:02:47 | Re: [HACKERS] PoC: full merge join on comparison clause | 
| Previous Message | Michael Banck | 2018-11-21 12:35:35 | Re: Online verification of checksums |