Re: Removing unneeded self joins

From: Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To: Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>, Greg Stark <stark(at)mit(dot)edu>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Hywel Carver <hywel(at)skillerwhale(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Removing unneeded self joins
Date: 2022-05-19 10:48:18
Message-ID: a1d6290c-44e0-0dfc-3fca-66a68b3109ef@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 5/17/22 19:14, Ronan Dunklau wrote:
> Le vendredi 13 mai 2022, 07:07:47 CEST Andrey Lepikhov a écrit :
>> New version of the feature.
>> Here a minor bug with RowMarks is fixed. A degenerated case is fixed,
>> when uniqueness of an inner deduced not from join quals, but from a
>> baserestrictinfo clauses 'x=const', where x - unique field.
>> Code, dedicated to solve second issue is controversial, so i attached
>> delta.txt for quick observation.
>> Maybe we should return to previous version of code, when we didn't split
>> restriction list into join quals and base quals?
>
> Hello,
>
> I tried to find problematic cases, which would make the planning time grow
> unacceptably, and couldn't devise it.
>
> The worst case scenario I could think of was as follows:
>
> - a query with many different self joins
> - an abundance of unique indexes on combinations of this table columns to
> consider
> - additional predicates on the where clause on columns.
Looking into the patch I can imagine, that the most difficult case is
when a set of relations with the same OID is huge, but only small part
of them (or nothing) can be removed.
Also, removing a clause from restrictinfo list or from equivalence class
adds non-linear complexity. So, you can dig this way ).
>
> The base table I used for this was a table with 40 integers. 39 unique indexes
> were defined on every combination of (c1, cX) with cX being columns c2 to c40.
>
> I turned geqo off, set from_collapse_limit and join_collapse_limit to
> unreasonably high values (30), and tried to run queries of the form:
>
> SELECT * FROM test_table t1
> JOIN test_table tX ON t1.c1 = tX.c1 AND t1.c[X+2] = tX.cX
> ...
> JOIN test_table tX ON t1.c1 = tX.c1 AND t1.c[X+2] = tX.cX.
>
> So no self join can be eliminated in that case.
I think, you should compare t1.cX with tX.cX to eliminate self-join.
Cross-unique-index proof isn't supported now.
> The performance was very similar with or without the GUC enabled. I tested the
> same thing without the patch, since the test for uniqueness has been slightly
> altered and incurs a new allocation, but it doesn't seem to change.
>
> One interesting side effect of this patch, is that removing any unneeded self
> join cuts down the planification time very significantly, as we lower the number
> of combinations to consider.
Even more - removing a join we improve cardinality estimation.
>
> As for the code:
>
> - Comments on relation_has_unique_index_ext and relation_has_unique_index_for
> should be rewritten, as relation_has_unique_index_for is now just a special
> case of relation_has_unique_index_ext. By the way, the comment would probably
> be better read as: "but if extra_clauses isn't NULL".
> - The whole thing about "extra_clauses", ie, baserestrictinfos which were
> used to determine uniqueness, is not very clear. Most functions where the new
> argument has been added have not seen an update in their comments, and the
> name itself doesn't really convey the intented meaning: perhaps
> required_non_join_clauses ?
>
> The way this works should be explained a bit more thoroughly, for example in
> remove_self_joins_one_group the purpose of uclauses should be explained. The
> fact that degenerate_case returns true when we don't have any additional base
> restrict info is also confusing, as well as the degenerate_case name.
Agree,
but after this case thoughts wander in my head: should we make one step
back to pre-[1] approach? It looks like we have quite similar changes,
but without special function for a 'degenerate case' detection and
restrictlist splitting.
>
> I'll update if I think of more interesting things to add.
Thank you for your efforts!

See in attachment next version which fixes mistakes detected by
zyu(at)yugabyte(dot)com(dot)

[1]
https://www.postgresql.org/message-id/raw/CAApHDvpggnFMC4yP-jUO7PKN%3DfXeErW5bOxisvJ0HvkHQEY%3DWw%40mail.gmail.com

--
Regards
Andrey Lepikhov
Postgres Professional

Attachment Content-Type Size
v34-0001-Remove-self-joins.patch text/x-patch 83.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2022-05-19 11:12:31 Re: Build-farm - intermittent error in 031_column_list.pl
Previous Message Amit Kapila 2022-05-19 10:40:27 Re: Zstandard support for toast compression