| From: | Richard Guo <guofenglinux(at)gmail(dot)com> |
|---|---|
| To: | Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Clean up remove_rel_from_query() after self-join elimination commit |
| Date: | 2026-04-06 08:11:03 |
| Message-ID: | CAMbWs48JC4OVqE=3gMB6se2WmRNNfMyFyYxm-09vgpm+Vwe8Hg@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
While working on reusing remove_rel_from_query() for inner-join
removal, I noticed that the function is in pretty rough shape. The
self-join elimination (SJE) commit grafted self-join removal onto
remove_rel_from_query(), which was originally written for left-join
removal only. This resulted in several issues:
1. Comments throughout remove_rel_from_query() still assumed only
left-join removal, making the code misleading. For example:
* This is relevant in case the outer join we're deleting is nested inside
* other outer joins:
2. This was called even for left-join removal, with subst=-1. This is
pointless and confusing.
ChangeVarNodesExtended((Node *) phv->phexpr, relid, subst, 0,
replace_relid_callback);
3. phinfo->ph_lateral was adjusted for left-join removal, which is
also confusing ...
phinfo->ph_lateral = adjust_relid_set(phinfo->ph_lateral, relid, subst);
/*
* ph_lateral might contain rels mentioned in ph_eval_at after the
* replacement, remove them.
*/
phinfo->ph_lateral = bms_difference(phinfo->ph_lateral,
phinfo->ph_eval_at);
... since you can find this Assert just above:
Assert(sjinfo == NULL || !bms_is_member(relid, phinfo->ph_lateral));
4. The comment about attr_needed reconstruction was in
remove_rel_from_query(), but the actual rebuild is performed by
the callers.
5. The EquivalenceClass processing in remove_rel_from_query():
/*
* Likewise remove references from EquivalenceClasses.
*/
foreach(l, root->eq_classes)
{
EquivalenceClass *ec = (EquivalenceClass *) lfirst(l);
if (bms_is_member(relid, ec->ec_relids) ||
(sjinfo == NULL || bms_is_member(sjinfo->ojrelid, ec->ec_relids)))
remove_rel_from_eclass(ec, sjinfo, relid, subst);
}
The condition always evaluates to true for self-join elimination
(i.e., sjinfo == NULL), meaning every EquivalenceClass gets adjusted.
But this is redundant because the caller remove_self_join_rel()
already handles ECs via update_eclasses().
6. In remove_self_join_rel(), I notice this code:
/* At last, replace varno in root targetlist and HAVING clause */
ChangeVarNodesExtended((Node *) root->processed_tlist, toRemove->relid,
toKeep->relid, 0, replace_relid_callback);
ChangeVarNodesExtended((Node *) root->processed_groupClause,
toRemove->relid, toKeep->relid, 0,
replace_relid_callback);
The comment mentions "HAVING clause", but neither processed_tlist nor
processed_groupClause has anything to do with the HAVING clause.
Furthermore, processed_groupClause contains SortGroupClause nodes,
which have no Var nodes to rewrite, so calling ChangeVarNodesExtended
on it is pointless.
The attached patch is an attempt to fix all these issues. It also
aims to leave the code better structured for adding new types of join
removal (such as inner-join removal) in the future.
Thoughts?
- Richard
| Attachment | Content-Type | Size |
|---|---|---|
| v1-0001-Clean-up-remove_rel_from_query-after-self-join-el.patch | application/octet-stream | 26.4 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Michael Paquier | 2026-04-06 08:21:40 | Re: BM_IO_ERROR flag is lost in TerminateBufferIO due to order of operations in UnlockBufHdrExt |
| Previous Message | Илья Чердаков | 2026-04-06 07:55:25 | Environment variable to disable diffs file output |