Re: v12: ERROR: subplan "InitPlan 2 (returns $4)" was not initialized

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, David Rowley <dgrowleyml(at)gmail(dot)com>
Subject: Re: v12: ERROR: subplan "InitPlan 2 (returns $4)" was not initialized
Date: 2023-04-11 19:59:08
Message-ID: 507872.1681243148@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Justin Pryzby <pryzby(at)telsasoft(dot)com> writes:
> postgres=# SET force_parallel_mode =1; CREATE TABLE x (i int) PARTITION BY RANGE (i); CREATE TABLE x1 PARTITION OF x DEFAULT ;
> select * from pg_class,
> lateral (select pg_catalog.bit_and(1)
> from pg_class as sample_1
> where case when EXISTS (
> select 1 from x
> where EXISTS (
> select 1 from pg_catalog.pg_depend
> where (sample_1.reltuples is NULL)
> )) then 1 end
> is NULL)x
> where false;

Interesting. The proximate cause is that we end up with a subplan
that is marked parallel_safe, but it has an initplan that is not
parallel_safe. The parallel worker receives, and tries to initialize,
the parallel_safe subplan, and falls over because of its reference
to the unsafe subplan -- which was not transmitted to the worker.

Actually, because of the policy installed by commit ab77a5a45, the
mere fact of having an initplan should be enough to disqualify
the first subplan from being marked parallel-safe.

I dug around and found the culprit: setrefs.c's
clean_up_removed_plan_level() moves initplans down from a parent
to a child plan node, but it forgot the possibility that the
child plan node had been marked parallel_safe before that and
must not be anymore.

The v1 patch attached is enough to fix the immediate issue,
but there's another thing not to like, which is that we're also
discarding the costs associated with the initplans. That's
strictly cosmetic given that all the planning decisions are
already made, but it still seems potentially annoying if you're
trying to understand EXPLAIN output. So I'm inclined to instead
do something like v2 attached, which deals with that as well.
(On the other hand, we aren't bothering to fix up costs when
we move initplans around in materialize_finished_plan or
standard_planner ... so maybe that should be left for a patch
that fixes those things too.)

Another thing worth wondering about is whether we can't loosen
commit ab77a5a45's policy that having an initplan is enough
to make you parallel-unsafe. In the wake of later fixes,
notably 5e6d8d2bb, it seems like maybe we could allow that
as long as the initplans themselves are parallel-safe. That
wouldn't be material for back-patching though, so I'll worry
about it later.

Not sure what if anything to do about a test case. I'm not
excited about memorializing the specific case found by sqlsmith,
because it seems only very accidental that it exposes this
problem. I found that there are existing regression tests
that exercise the situation where clean_up_removed_plan_level
generates an incorrectly-marked plan, but there is accidentally
no bad effect. (The planner itself isn't going to be making
any further decisions with the bogus info; it's only
ExecSerializePlan that pays attention to the flag, and we'd only
notice in this specific cross-reference situation.) Also, any
change we make along the lines speculated about in the previous
para would be highly likely to break a test case, in the sense
that it'd no longer exercise the previously-failing scenario.
So on the whole I'm inclined not to bother with a new test case.

regards, tom lane

Attachment Content-Type Size
fix-parallel-safety-flags-v1.patch text/x-diff 654 bytes
fix-parallel-safety-flags-v2.patch text/x-diff 1.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2023-04-11 20:03:37 Re: [PATCH] Use role name "system_user" instead of "user" for unsafe_tests
Previous Message Andres Freund 2023-04-11 19:56:24 Re: Assertion being hit during WAL replay