Clean up remove_rel_from_query() after self-join elimination commit

From: Richard Guo <guofenglinux(at)gmail(dot)com>
To: Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Clean up remove_rel_from_query() after self-join elimination commit
Date: 2026-04-06 08:11:03
Message-ID: CAMbWs48JC4OVqE=3gMB6se2WmRNNfMyFyYxm-09vgpm+Vwe8Hg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

While working on reusing remove_rel_from_query() for inner-join
removal, I noticed that the function is in pretty rough shape. The
self-join elimination (SJE) commit grafted self-join removal onto
remove_rel_from_query(), which was originally written for left-join
removal only. This resulted in several issues:

1. Comments throughout remove_rel_from_query() still assumed only
left-join removal, making the code misleading. For example:

* This is relevant in case the outer join we're deleting is nested inside
* other outer joins:

2. This was called even for left-join removal, with subst=-1. This is
pointless and confusing.

ChangeVarNodesExtended((Node *) phv->phexpr, relid, subst, 0,
replace_relid_callback);

3. phinfo->ph_lateral was adjusted for left-join removal, which is
also confusing ...

phinfo->ph_lateral = adjust_relid_set(phinfo->ph_lateral, relid, subst);

/*
* ph_lateral might contain rels mentioned in ph_eval_at after the
* replacement, remove them.
*/
phinfo->ph_lateral = bms_difference(phinfo->ph_lateral,
phinfo->ph_eval_at);

... since you can find this Assert just above:

Assert(sjinfo == NULL || !bms_is_member(relid, phinfo->ph_lateral));

4. The comment about attr_needed reconstruction was in
remove_rel_from_query(), but the actual rebuild is performed by
the callers.

5. The EquivalenceClass processing in remove_rel_from_query():

/*
* Likewise remove references from EquivalenceClasses.
*/
foreach(l, root->eq_classes)
{
EquivalenceClass *ec = (EquivalenceClass *) lfirst(l);

if (bms_is_member(relid, ec->ec_relids) ||
(sjinfo == NULL || bms_is_member(sjinfo->ojrelid, ec->ec_relids)))
remove_rel_from_eclass(ec, sjinfo, relid, subst);
}

The condition always evaluates to true for self-join elimination
(i.e., sjinfo == NULL), meaning every EquivalenceClass gets adjusted.
But this is redundant because the caller remove_self_join_rel()
already handles ECs via update_eclasses().

6. In remove_self_join_rel(), I notice this code:

/* At last, replace varno in root targetlist and HAVING clause */
ChangeVarNodesExtended((Node *) root->processed_tlist, toRemove->relid,
toKeep->relid, 0, replace_relid_callback);
ChangeVarNodesExtended((Node *) root->processed_groupClause,
toRemove->relid, toKeep->relid, 0,
replace_relid_callback);

The comment mentions "HAVING clause", but neither processed_tlist nor
processed_groupClause has anything to do with the HAVING clause.
Furthermore, processed_groupClause contains SortGroupClause nodes,
which have no Var nodes to rewrite, so calling ChangeVarNodesExtended
on it is pointless.

The attached patch is an attempt to fix all these issues. It also
aims to leave the code better structured for adding new types of join
removal (such as inner-join removal) in the future.

Thoughts?

- Richard

Attachment Content-Type Size
v1-0001-Clean-up-remove_rel_from_query-after-self-join-el.patch application/octet-stream 26.4 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2026-04-06 08:21:40 Re: BM_IO_ERROR flag is lost in TerminateBufferIO due to order of operations in UnlockBufHdrExt
Previous Message Илья Чердаков 2026-04-06 07:55:25 Environment variable to disable diffs file output