Re: Removing unneeded self joins

From: Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>
To: Greg Stark <stark(at)mit(dot)edu>, Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Hywel Carver <hywel(at)skillerwhale(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Removing unneeded self joins
Date: 2022-07-04 13:49:59
Message-ID: 2375492.jE0xQCEvom@aivenronan
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Le jeudi 30 juin 2022, 16:11:51 CEST Andrey Lepikhov a écrit :
> On 19/5/2022 16:47, Ronan Dunklau wrote:
> > I'll take a look at that one.
> New version of the patch, rebased on current master:
> 1. pgindent over the patch have passed.
> 2. number of changed files is reduced.
> 3. Some documentation and comments is added.

Hello Andrey,

Thanks for the updates.

The general approach seems sensible to me, so I'm going to focus on some details.

In a very recent thread [1], Tom Lane is proposing to add infrastructure to make Var aware of their nullability by outer joins. I wonder if that would help with avoiding the need for adding is not null clauses when the column is known not null ?
If we have a precedent for adding a BitmapSet to the Var itself, maybe the whole discussion regarding keeping track of nullability can be extended to the original column nullability ?

Also, I saw it was mentioned earlier in the thread but how difficult would it be to process the transformed quals through the EquivalenceClass machinery and the qual simplification ?
For example, if the target audience of this patch is ORM, or inlined views, it wouldn't surprise me to see queries of this kind in the wild, which could be avoided altogether:

postgres=# explain (costs off) select * from sj s1 join sj s2 on s1.a = s2.a where s1.b = 2 and s2.b =3;
Seq Scan on sj s2
Filter: ((a IS NOT NULL) AND (b = 3) AND (b = 2))
(2 lignes)

+ for (counter = 0; counter < list_length(*sources);)
+ {
+ ListCell *cell = list_nth_cell(*sources, counter);
+ RestrictInfo *rinfo = castNode(RestrictInfo, lfirst(cell));
+ int counter1;
+ ec->ec_members = list_delete_cell(ec->ec_members, cell);

Why don't you use foreach() and foreach_delete_current macros for iterating and removing items in the lists, both in update_ec_members and update_ec_sources ?

+ if ((bms_is_member(k, info->syn_lefthand) ^
+ bms_is_member(r, info->syn_lefthand)) ||
+ (bms_is_member(k, info->syn_righthand) ^
+ bms_is_member(r, info->syn_righthand)))

I think this is more compact and easier to follow than the previous version, but I'm not sure how common it is in postgres source code to use that kind of construction ?

Some review about the comments:

I see you keep using the terms "the keeping relation" and "the removing relation" in reference to the relation that is kept and the one that is removed.
Aside from the grammar (the kept relation or the removed relation), maybe it would make it clearer to call them something else. In other parts of the code, you used "the remaining relation / the removed relation" which makes sense.

* Remove the target relid from the planner's data structures, having
- * determined that there is no need to include it in the query.
+ * determined that there is no need to include it in the query. Or replace
+ * with another relid.
+ * To reusability, this routine can work in two modes: delete relid from a plan
+ * or replace it. It is used in replace mode in a self-join removing process.

This could be rephrased: ", optionally replacing it with another relid. The latter is used by the self-join removing process."


Ronan Dunklau

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2022-07-04 13:55:30 Re: pgsql: Change timeline field of IDENTIFY_SYSTEM to int8
Previous Message Alvaro Herrera 2022-07-04 13:27:13 Re: list of TransactionIds