Re: Removing unneeded self joins

From: Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>
To: Greg Stark <stark(at)mit(dot)edu>, Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Hywel Carver <hywel(at)skillerwhale(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Removing unneeded self joins
Date: 2022-05-19 11:47:36
Message-ID: 13021914.uLZWGnKmhe@aivenronan
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Le jeudi 19 mai 2022, 12:48:18 CEST Andrey Lepikhov a écrit :
> On 5/17/22 19:14, Ronan Dunklau wrote:
> > Le vendredi 13 mai 2022, 07:07:47 CEST Andrey Lepikhov a écrit :
> >> New version of the feature.
> >> Here a minor bug with RowMarks is fixed. A degenerated case is fixed,
> >> when uniqueness of an inner deduced not from join quals, but from a
> >> baserestrictinfo clauses 'x=const', where x - unique field.
> >> Code, dedicated to solve second issue is controversial, so i attached
> >> delta.txt for quick observation.
> >> Maybe we should return to previous version of code, when we didn't split
> >> restriction list into join quals and base quals?
> >
> > Hello,
> >
> > I tried to find problematic cases, which would make the planning time grow
> > unacceptably, and couldn't devise it.
> >
> > The worst case scenario I could think of was as follows:
> > - a query with many different self joins
> > - an abundance of unique indexes on combinations of this table columns
> > to
> >
> > consider
> >
> > - additional predicates on the where clause on columns.
>
> Looking into the patch I can imagine, that the most difficult case is
> when a set of relations with the same OID is huge, but only small part
> of them (or nothing) can be removed.
> Also, removing a clause from restrictinfo list or from equivalence class
> adds non-linear complexity. So, you can dig this way ).
>
> > The base table I used for this was a table with 40 integers. 39 unique
> > indexes were defined on every combination of (c1, cX) with cX being
> > columns c2 to c40.
> >
> > I turned geqo off, set from_collapse_limit and join_collapse_limit to
> > unreasonably high values (30), and tried to run queries of the form:
> >
> > SELECT * FROM test_table t1
> > JOIN test_table tX ON t1.c1 = tX.c1 AND t1.c[X+2] = tX.cX
> > ...
> > JOIN test_table tX ON t1.c1 = tX.c1 AND t1.c[X+2] = tX.cX.
> >
> > So no self join can be eliminated in that case.
>
> I think, you should compare t1.cX with tX.cX to eliminate self-join.
> Cross-unique-index proof isn't supported now.

Yes, that's the point. I wanted to try to introduce as much complexity as I
could, without actually performing any self join elimination. The idea was to
try to come up with the worst case scenario.
>
> > The performance was very similar with or without the GUC enabled. I tested
> > the same thing without the patch, since the test for uniqueness has been
> > slightly altered and incurs a new allocation, but it doesn't seem to
> > change.
> >
> > One interesting side effect of this patch, is that removing any unneeded
> > self join cuts down the planification time very significantly, as we
> > lower the number of combinations to consider.
>
> Even more - removing a join we improve cardinality estimation.
>
> > As for the code:
> > - Comments on relation_has_unique_index_ext and
> > relation_has_unique_index_for>
> > should be rewritten, as relation_has_unique_index_for is now just a
> > special
> > case of relation_has_unique_index_ext. By the way, the comment would
> > probably be better read as: "but if extra_clauses isn't NULL".
> >
> > - The whole thing about "extra_clauses", ie, baserestrictinfos which
> > were
> >
> > used to determine uniqueness, is not very clear. Most functions where the
> > new argument has been added have not seen an update in their comments,
> > and the name itself doesn't really convey the intented meaning: perhaps
> > required_non_join_clauses ?
> >
> > The way this works should be explained a bit more thoroughly, for example
> > in remove_self_joins_one_group the purpose of uclauses should be
> > explained. The fact that degenerate_case returns true when we don't have
> > any additional base restrict info is also confusing, as well as the
> > degenerate_case name.
> Agree,
> but after this case thoughts wander in my head: should we make one step
> back to pre-[1] approach? It looks like we have quite similar changes,
> but without special function for a 'degenerate case' detection and
> restrictlist splitting.

I'll take a look at that one.

>
> > I'll update if I think of more interesting things to add.
>
> Thank you for your efforts!
>
> See in attachment next version which fixes mistakes detected by
> zyu(at)yugabyte(dot)com(dot)
>
> [1]
> https://www.postgresql.org/message-id/raw/CAApHDvpggnFMC4yP-jUO7PKN%3DfXeErW
> 5bOxisvJ0HvkHQEY%3DWw%40mail.gmail.com

--
Ronan Dunklau

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2022-05-19 12:07:24 Re: bogus: logical replication rows/cols combinations
Previous Message Andrew Dunstan 2022-05-19 11:28:53 Re: Addition of PostgreSQL::Test::Cluster::pg_version()