| From: | Thom Brown <thom(at)linux(dot)com> |
|---|---|
| To: | Amit Langote <amitlangote09(at)gmail(dot)com> |
| Cc: | Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Tender Wang <tndrwang(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, Tomas Vondra <tomas(at)vondra(dot)me>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Daniel Gustafsson <daniel(at)yesql(dot)se>, David Rowley <dgrowleyml(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: generic plans and "initial" pruning |
| Date: | 2026-05-28 13:13:46 |
| Message-ID: | CAA-aLv4EdaMXs+NkZYEHMUr7qcLbJweg=uNwqQq43KC+ycuk6A@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Thu, 28 May 2026 at 09:14, Amit Langote <amitlangote09(at)gmail(dot)com> wrote:
>
> Hi Thom,
>
> On Wed, May 27, 2026 at 9:03 PM Thom Brown <thom(at)linux(dot)com> wrote:
> >
> > On Sat, 4 Apr 2026 at 13:11, Amit Langote <amitlangote09(at)gmail(dot)com> wrote:
> > >
> > > Attached is a redesigned version. While working on the previous
> > > design, I grew increasingly uncomfortable with CachedPlanPrepData --
> > > it was smuggling executor state out of GetCachedPlan() through an
> > > out-parameter, which papered over the real problem: GetCachedPlan()
> > > was doing too much. The main change in this version is architectural:
> > > GetCachedPlan() no longer acquires execution locks. Callers now own
> > > that responsibility, which is natural because each call site iterates
> > > stmt_list differently and manages execution state in its own way --
> > > and it lets them choose between conservative lock-all and
> > > pruning-aware locking where appropriate.
> > >
> > > Non-portal call sites remain on the conservative path for now.
> > > _SPI_execute_plan requires care around snapshot setup, which happens
> > > after plan fetch rather than before. SQL functions have a different
> > > issue: init_execution_state() fetches the plan while postquel_start()
> > > handles execution, with execution_state containers in between, making
> > > it harder to thread a prepped QueryDesc through. The portal path and
> > > EXPLAIN EXECUTE cover the most common
> > > prepared-statement-with-partitions workloads; the remaining sites can
> > > be converted incrementally.
> > >
> > > This is now starting to feel closer to what Tom suggested back in
> > > January 2023 [1], where he proposed getting rid of
> > > AcquireExecutorLocks() inside GetCachedPlan() entirely and pushing
> > > lock acquisition out to callers. He noted that "we'd be pushing the
> > > responsibility for looping back and re-planning out to fairly
> > > high-level calling code" and that "we'd definitely be changing some
> > > fundamental APIs." That is the direction I came around to over the
> > > last couple of weeks while wrestling with CachedPlanPrepData. The
> > > reverted approach also tried to follow Tom's direction but moved
> > > locking into ExecutorStart(), which forced it to handle plan
> > > invalidation from inside the executor by mutating the CachedPlan
> > > in-place. This version moves locking out to the callers instead, so
> > > the executor and plan cache never reach into each other.
> > >
> > > The series is now four patches:
> > >
> > > 0001: Move execution lock acquisition out of GetCachedPlan(). Adds
> > > AcquireExecutorLocks() as a caller-facing function with validity check
> > > and retry. Adds PortalLockCachedPlan() in pquery.c to centralize the
> > > portal retry logic. All callers are converted. No behavioral change.
> > >
> > > 0002: Refactor executor's initial partition pruning setup. Cleanup
> > > only, no behavioral change.
> > >
> > > 0003: Introduce ExecutorPrep() and refactor executor startup. Factors
> > > range table init, permission checks, and initial pruning out of
> > > InitPlan(). Scaffolding for 0004; all callers still go through the
> > > normal ExecutorStart() path.
> > >
> > > 0004: Use pruning-aware locking for single-statement cached plans.
> > > Adds ExecutorPrepAndLock() which locks unprunable relations, runs
> > > ExecutorPrep() to determine surviving partitions, then locks only
> > > those. Extends PortalLockCachedPlan() with a pruning-aware path for
> > > eligible plans. Multi-statement CachedPlans (from rule rewriting)
> > > always use conservative locking. In principle, this could be relaxed
> > > if the planner can prove that no pruning expression reads state
> > > modified by an earlier statement, but that is left for a future patch.
> > > Includes regression tests.
> > >
> > > In case it's not clear, I'm not targeting v19 at this point. I'd like
> > > to get this into v20 CF1 and would welcome review from anyone
> > > interested.
> >
> > After not having looked at this in close to 2 years, I thought I'd
> > give it another look.
>
> Thanks for taking a look.
>
> > Not found any user-facing issues, and I'm liking
> > seeing so few locks in pg_locks. I can see that with pruning disabled,
> > the fallback works, pruning-aware locking is working via SPI through
> > plpgsql, running ALTER between executions and also invalidating
> > indexes force replans, and it's looking good.
> >
> > But I also think there might be a bug in patch 0001, but I'd
> > appreciate checking my reasoning because I'm not fully confident I've
> > been diligent enough.
> >
> > When PortalStart() opens a SELECT cursor that's backed by a cached
> > plan, it does roughly the following. It builds a queryDesc (an
> > executor-side struct), one of whose fields is a pointer into the plan
> > tree inside the portal's cached plan. Then it calls
> > PortalLockCachedPlan() to acquire the necessary locks, and finally
> > hands the queryDesc over to the executor.
> >
> > My worry is about what happens if the cached plan turns out to be
> > stale, for instance because someone ran DDL on a referenced table. In
> > that case PortalLockCachedPlan() throws the old plan away (via
> > ReleaseCachedPlan) and fetches a freshly-built replacement, updtating
> > the portal's own pointers to match. But the queryDesc from earlier
> > isn't touched. Its plan pointer still references the old, now-released
> > plan. From what I can see, once that old plan's last reference is
> > dropped its memory can be freed, which would leave the executor
> > reading from freed memory in the next step.
> >
> > The bit I'm least sure about is whether the old plan's memory really
> > does get reclaimed straight away when its refcount hits zero. If
> > something keeps it alive longer then this isn't a bug, or at least not
> > as bad as I'm making out. I had a look but couldn't convince myself
> > either way from the code alone. To actually hit this you'd need a
> > cursor on a cached plan, plus an invalidation arriving in the small
> > window between the portal being set up and the cursor being opened.
> > The race condition is brief, and I've not been able to hit it in
> > testing.
> >
> > The thing that got me thinking this is real: patch 0004 modifies
> > PortalLockCachedPlan() so that whenever it replans, it also rebuilds
> > the queryDesc. That's pretty much the fix I'd expect for this, which
> > makes me suspect somebody hit it at some point. But 0004 only applies
> > that fix on the new pruning-aware code path, and it was mentioned in
> > the thread that 0001 to 0003 might land before 0004. If so, master
> > would carry the bug in the gap between the two.
> >
> > I suspect a way to deal with it would be to move the CreateQueryDesc
> > call in the SELECT case to after PortalLockCachedPlan() returns, which
> > is what the other portal strategies already seem to do. Alternatively,
> > you could bring 0004's changes in this area into 0001 and have
> > PortalLockCachedPlan() always rebuild the queryDesc when it replans.
> >
> > If I've got this wrong and there's some lifetime mechanism I missed
> > that keeps the old plan's memory alive, then it's a non-issue and I'm
> > misreading the code. If I have got it wrong, could you please add
> > comments to make what is going on clearer?
>
> It's a real bug.
>
> You're right that if PortalLockCachedPlan() replans, the QueryDesc
> created before the call still points at the old PlannedStmt from the
> released plan. And yes, 0004 happens to fix it by rebuilding the
> QueryDesc inside PortalLockCachedPlan(), but 0001 through 0003 are
> broken on their own.
>
> Attached is an updated set with the fix: CreateQueryDesc now runs
> after PortalLockCachedPlan() returns, as you suggested. That said,
> I'll probably focus first on settling the plancache refactoring that
> spun off from this thread [1], and then start a new thread for the
> pruning-aware locking work on top of it, incorporating parts of this
> series.
Thanks.
I've done another pass. I see a reference to
AcquireExecutorLocksUnpruned(), but I can't find this function. Is
this supposed to be AcquireExecutorLocksPrepared()?
And also I have a question about the new firstResultRels code
If I've followed it right, the bit in setrefs.c records the
lowest-numbered RT index from leaf_result_relids as the
per-ModifyTable fallback that's used when all real targets get pruned
away, and the executor side looks it up via
linitial_int(node->resultRelations). For that to work those two have
to pick the same RT index, and the comment justifies it with
"partition expansion preserves RT index order". Where is that
preservation guaranteed?
And with the assertion in ExecInitModifyTable:
Assert(list_member_int(estate->es_plannedstmt->firstResultRels, rti));
With writable CTEs producing more than one ModifyTable node the list
has several entries, so all the assert really checks is that some
recorded entry matches, not that the one recorded for this particular
node matches. If that's correct, then in a case where the wrong entry
happened to line up the right relation wouldn't be locked and nothing
would complain. Is there something that keeps these in order
somewhere?
Thom
| From | Date | Subject | |
|---|---|---|---|
| Next Message | jian he | 2026-05-28 13:23:59 | Re: Row pattern recognition |
| Previous Message | Fujii Masao | 2026-05-28 12:55:46 | Re: Set notice receiver before libpq connection startup |