From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Alexander Lakhin <exclusion(at)gmail(dot)com> |
Cc: | pgsql-bugs(at)lists(dot)postgresql(dot)org, Richard Guo <guofenglinux(at)gmail(dot)com> |
Subject: | Re: BUG #18953: Planner fails to build plan for complex query with LATERAL references |
Date: | 2025-06-23 18:24:01 |
Message-ID: | 3886446.1750703041@sss.pgh.pa.us |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
I wrote:
> Doing things the way Richard wanted to (ie using the nestloop's
> required_outer) gets past the "failed to assign all NestLoopParams"
> error, but then we get the same "unrecognized node type: 22" error
> as in the first example.
> That error seems considerably nastier to fix. I think it is a
> pre-existing problem, though I don't currently have an explanation
> why we've not seen it reported before. What is happening is that
> we pull up a PHV for "c", containing a SubLink for (SELECT true),
> and only some copies of it get put through preprocess_expression,
> which is responsible for converting SubLinks to SubPlans among
> other essential tasks. It seems that up to now we've always
> managed to use only preprocessed copies in the finished plan.
> But now a not-preprocessed copy is managing to get through to
> the executor, which promptly spits up because it doesn't know
> what to do with a SubLink.
> We could probably hack this in a localized way by ensuring that
> we push a preprocessed copy of the PHV into the left plan's tlist.
> (One way to get that would be to look in the placeholder_list for the
> correct phid.) However, now that I've seen this I have very little
> faith that there aren't other bugs of the same ilk, that somehow
> we've not seen up to now. I'm thinking about what we must do to
> ensure that all PHVs are preprocessed once and only once, perhaps
> at the moment they get put into the placeholder_list.
I've traced through this in more detail, and found that it's
inaccurate to claim that the PHV's expression escaped preprocessing
altogether. (If it had, there would be more failure modes besides
"unprocessed SubLink", and we'd likely have noticed sooner.)
Rather, the problem is that SS_process_sublinks() skips replacement of
SubLinks within PlaceHolderVars that have phlevelsup > 0, expecting
that to be handled later. So the sequence of events that Alexander's
test case causes is:
1. We pull up the first sub-select "(SELECT (SELECT true) AS c FROM t,
LATERAL (SELECT b LIMIT 1))", causing the "WHERE c" in the later
LATERAL sub-select to be replaced by PHV((SELECT true)); a PHV is
needed since the pulled-up sub-select was under an outer join.
The PHV has phlevelsup = 1 since it's pushed down into a sub-select.
2. Expression pre-processing happens in the separately-planned
LATERAL sub-select, and while it does most of what's needed, it
explicitly skips replacement of the PHV's SubLink.
3. SS_replace_correlation_vars pulls the PHV out of the LATERAL
sub-select, replacing it with a Param and adding it to the
plan_params of the outer query level. Now it has phlevelsup = 0,
but there's still a SubLink inside it.
4. The code added by a16ef313f2 copies the PHV verbatim from
the plan_params list (via a subquery rel's subplan_params and
root->curOuterParams) into the outer subplan's tlist. Kaboom!
Now, if the PHV matched something that was already in the
outer subplan's tlist (recall it only needs to match by phid)
then we're safe, because that copy would have gotten fixed up
during extract_lateral_references. But in this example that
doesn't happen, probably because the PHV contains no Vars
so it doesn't get put into any baserel's reltarget.
I'm still trying to wrap my head around whether there are any
related failure modes in released branches. Steps 1-3 certainly
have been happening just like that for a long time, so that there
can be a PHV with an unexpanded SubLink floating around the
outer query level's data structures. Is the a16ef313f2 code
really the only way that that version of the PHV can make its way
into the emitted plan tree? Maybe, but I'm far from convinced.
Anyway, with one eye on the possibility that we'll need to back-patch
this in some form, I went for the localized fix of ensuring that
we use a copy of the PHV that's been through the normal processing.
I do want to look into restructuring the handling of these PHVs
to make this less messy, but that's not a job to undertake for v18
(much less if we have to back-patch).
One thing worth noting is that there's a visible shortcoming in the
generated plans for the new test cases: there are two copies of the
InitPlan that arises from the PHV's SubLink. One of them is
unreferenced and hence doesn't get used at runtime. That happens
because we are performing SS_process_sublinks on the PHV twice.
It's not fatal, but it's not great either. Perhaps a better design
could avoid that. I definitely don't want to adopt a fix that
results in still another InitPlan, so adding YA invocation of
preprocessing doesn't sound good.
regards, tom lane
Attachment | Content-Type | Size |
---|---|---|
fix-bug-18953-some-more.patch | text/x-diff | 11.3 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Ajin Cherian | 2025-06-24 00:20:06 | Re: BUG #18509: Logical decoding behaves badly when processing a change record for a table with altered column |
Previous Message | Masahiko Sawada | 2025-06-23 14:05:35 | Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5 |