Quick Links

Re: Removing unneeded self joins

From:	Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>
To:	Greg Stark <stark(at)mit(dot)edu>, Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Hywel Carver <hywel(at)skillerwhale(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2022-07-04 13:49:59
Message-ID:	2375492.jE0xQCEvom@aivenronan
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Le jeudi 30 juin 2022, 16:11:51 CEST Andrey Lepikhov a écrit :
> On 19/5/2022 16:47, Ronan Dunklau wrote:
> > I'll take a look at that one.
>
> New version of the patch, rebased on current master:
> 1. pgindent over the patch have passed.
> 2. number of changed files is reduced.
> 3. Some documentation and comments is added.

Hello Andrey,

Thanks for the updates.

The general approach seems sensible to me, so I'm going to focus on some details.

In a very recent thread [1], Tom Lane is proposing to add infrastructure to make Var aware of their nullability by outer joins. I wonder if that would help with avoiding the need for adding is not null clauses when the column is known not null ?
If we have a precedent for adding a BitmapSet to the Var itself, maybe the whole discussion regarding keeping track of nullability can be extended to the original column nullability ?

Also, I saw it was mentioned earlier in the thread but how difficult would it be to process the transformed quals through the EquivalenceClass machinery and the qual simplification ?
For example, if the target audience of this patch is ORM, or inlined views, it wouldn't surprise me to see queries of this kind in the wild, which could be avoided altogether:

postgres=# explain (costs off) select * from sj s1 join sj s2 on s1.a = s2.a where s1.b = 2 and s2.b =3;
QUERY PLAN
-----------------------------------------------------
Seq Scan on sj s2
Filter: ((a IS NOT NULL) AND (b = 3) AND (b = 2))
(2 lignes)

+ for (counter = 0; counter < list_length(*sources);)
+ {
+ ListCell *cell = list_nth_cell(*sources, counter);
+ RestrictInfo *rinfo = castNode(RestrictInfo, lfirst(cell));
+ int counter1;
+
....
+ ec->ec_members = list_delete_cell(ec->ec_members, cell);

Why don't you use foreach() and foreach_delete_current macros for iterating and removing items in the lists, both in update_ec_members and update_ec_sources ?

+ if ((bms_is_member(k, info->syn_lefthand) ^
+ bms_is_member(r, info->syn_lefthand)) ||
+ (bms_is_member(k, info->syn_righthand) ^
+ bms_is_member(r, info->syn_righthand)))

I think this is more compact and easier to follow than the previous version, but I'm not sure how common it is in postgres source code to use that kind of construction ?

Some review about the comments:

I see you keep using the terms "the keeping relation" and "the removing relation" in reference to the relation that is kept and the one that is removed.
Aside from the grammar (the kept relation or the removed relation), maybe it would make it clearer to call them something else. In other parts of the code, you used "the remaining relation / the removed relation" which makes sense.

/*
* Remove the target relid from the planner's data structures, having
- * determined that there is no need to include it in the query.
+ * determined that there is no need to include it in the query. Or replace
+ * with another relid.
+ * To reusability, this routine can work in two modes: delete relid from a plan
+ * or replace it. It is used in replace mode in a self-join removing process.

This could be rephrased: ", optionally replacing it with another relid. The latter is used by the self-join removing process."

[1] https://www.postgresql.org/message-id/flat/830269.1656693747%40sss.pgh.pa.us

--
Ronan Dunklau

In response to

Re: Removing unneeded self joins at 2022-06-30 14:11:51 from Andrey Lepikhov

Responses

Re: Removing unneeded self joins at 2022-07-04 14:16:31 from Zhihong Yu

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Peter Eisentraut	2022-07-04 13:55:30	Re: pgsql: Change timeline field of IDENTIFY_SYSTEM to int8
Previous Message	Alvaro Herrera	2022-07-04 13:27:13	Re: list of TransactionIds