Re: [BUG] Remove self joins causes 'variable not found in subplan target lists' error

From: Richard Guo <guofenglinux(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Sergey Soloviev <sergey(dot)soloviev(at)tantorlabs(dot)ru>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: [BUG] Remove self joins causes 'variable not found in subplan target lists' error
Date: 2025-08-27 06:56:24
Message-ID: CAMbWs4_14JzUQaKkAhxyv40aWOQsGtZ1hWuYeaSg3OjT_w7fQA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Aug 27, 2025 at 2:26 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Richard Guo <guofenglinux(at)gmail(dot)com> writes:
> > Instead of repeatedly calling make_pathkeys_for_sortclauses to detect
> > redundant expressions, I'm wondering if we could introduce a function
> > that detects whether a given expression is known equal to a constant
> > by the EquivalenceClass machinery. This function should not be too
> > costly, as we'd only need to examine ECs that contain pseudoconstants.

> I did consider that, but rejected it. I kind of like the fact that
> the v3 patch gets rid of duplicated columns-to-be-made-unique.
> Yes, repeatedly doing make_pathkeys_for_sortclauses is a bit grotty,
> because it's O(N^2) in the number of columns-to-be-made-unique ...
> but realistically, how often is that ever more than a couple?
> In the common case where the semijoin has only one join column,
> the patch adds basically zero work, which is nice too.

Fair point. I'm now inclined to agree that the v3 patch is the better
approach, as it removes duplicate grouping expressions.

I looked into the first loose end you mentioned earlier and tried to
write a query that would produce duplicate hash keys for HashAggregate
but failed. After checking all the callers of create_agg_path(), it
seems that we now eliminate duplicate grouping keys in all cases.

That commit mentioned the possibility that duplicate columns might
have different hash functions assigned, so we may still need the
changes in the executor. But we need to update the test case with a
new query that produces duplicate hash keys, or remove it if we cannot
find one.

On the other hand, we may want to add a similar query in join.sql to
show that duplicate grouping expressions can be removed during
unique-ification.

> > We could then use this function to remove expressions that are known
> > constant from semi_rhs_exprs. And if we find that all expressions
> > in semi_rhs_exprs are known constant (the second loose end you
> > mentioned), we can give up building unique paths and fall back to a
> > traditional JOIN_SEMI.

> Yeah, I was thinking we could just use the paths of the existing
> rel, but really this case means that we'd need to de-dup down
> to a single row. We could maybe do something involving plastering
> LIMIT 1 on top of each input path, but it's such a weird and
> probably-useless-in-the-real-world case that I don't think it's
> worth expending effort on. Failing outright seems fine.

Yeah. When computing semi_rhs_exprs in compute_semijoin_info(), we
punt if we find that there are no columns to unique-ify.

/* Punt if we didn't find at least one column to unique-ify */
if (semi_rhs_exprs == NIL)
return;

I think the case we're discussing here -- where all semi_rhs_exprs are
known constants -- is essentially the same in that it leaves us no
columns to unique-ify. So it should be fine to just punt in this case
as well.

So I think we may need to add the following changes:

+ /* If there are no columns left to unique-ify, return NULL. */
+ if (sortPathkeys == NIL && groupClause == NIL)
+ {
+ MemoryContextSwitchTo(oldcontext);
+ return NULL;
+ }
+
/* build unique paths based on input rel's pathlist */

> > Another question we need to consider is how to fix this error in v18,
> > where it seems we'll need to remove redundant expressions during
> > createplan.c.

> Ugh ... I forgot you just refactored all that code.
>
> I wonder if the right answer in v18 is to back out a3179ab69.
> Not fixing that performance regression for another year would
> be mildly annoying; but AFAIR we've still had only the one
> complaint, so it seems like it's not hurting many people.
> And we are getting mighty late in the release calendar to be
> inventing new stuff in v18.

Yeah, inventing new stuff in v18 at this point seems risky. I think
backing out a3179ab69 should be fine for v18.

Thanks
Richard

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Chao Li 2025-08-27 07:26:08 Re: SQL:2023 JSON simplified accessor support
Previous Message Masahiko Sawada 2025-08-27 06:41:36 Re: POC: enable logical decoding when wal_level = 'replica' without a server restart