Re: generic plans and "initial" pruning

From: Amit Langote <amitlangote09(at)gmail(dot)com>
To: Thom Brown <thom(at)linux(dot)com>
Cc: Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Tender Wang <tndrwang(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, Tomas Vondra <tomas(at)vondra(dot)me>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Daniel Gustafsson <daniel(at)yesql(dot)se>, David Rowley <dgrowleyml(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: generic plans and "initial" pruning
Date: 2026-05-29 08:56:51
Message-ID: CA+HiwqGq2S+NL3Q8sYh2u9XLXmNYTy-Z6HqmTW4VHUahB=yqjw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, May 28, 2026 at 10:14 PM Thom Brown <thom(at)linux(dot)com> wrote:
> On Thu, 28 May 2026 at 09:14, Amit Langote <amitlangote09(at)gmail(dot)com> wrote:
> > It's a real bug.
> >
> > You're right that if PortalLockCachedPlan() replans, the QueryDesc
> > created before the call still points at the old PlannedStmt from the
> > released plan. And yes, 0004 happens to fix it by rebuilding the
> > QueryDesc inside PortalLockCachedPlan(), but 0001 through 0003 are
> > broken on their own.
> >
> > Attached is an updated set with the fix: CreateQueryDesc now runs
> > after PortalLockCachedPlan() returns, as you suggested. That said,
> > I'll probably focus first on settling the plancache refactoring that
> > spun off from this thread [1], and then start a new thread for the
> > pruning-aware locking work on top of it, incorporating parts of this
> > series.
>
> Thanks.
>
> I've done another pass. I see a reference to
> AcquireExecutorLocksUnpruned(), but I can't find this function. Is
> this supposed to be AcquireExecutorLocksPrepared()?

You're right, stale comment. It should say
AcquireExecutorLocksPrepared(). Fixed.

> And also I have a question about the new firstResultRels code
>
> If I've followed it right, the bit in setrefs.c records the
> lowest-numbered RT index from leaf_result_relids as the
> per-ModifyTable fallback that's used when all real targets get pruned
> away, and the executor side looks it up via
> linitial_int(node->resultRelations). For that to work those two have
> to pick the same RT index, and the comment justifies it with
> "partition expansion preserves RT index order". Where is that
> preservation guaranteed?

The ordering comes from expand_inherited_rtentry(), which adds child
partitions to the range table sequentially in partition bound order.
Since ModifyTable.resultRelations is built from the same expansion,
its first element is the lowest-numbered RT index among the leaf
partitions for that node. That is the same value
bms_next_member(leaf_result_relids, -1) returns from the Bitmapset,
because Bitmapset iteration returns members in ascending order. I've
added a comment in setrefs.c pointing to expand_inherited_rtentry() as
the source of this guarantee.

> And with the assertion in ExecInitModifyTable:
>
> Assert(list_member_int(estate->es_plannedstmt->firstResultRels, rti));
>
> With writable CTEs producing more than one ModifyTable node the list
> has several entries, so all the assert really checks is that some
> recorded entry matches, not that the one recorded for this particular
> node matches. If that's correct, then in a case where the wrong entry
> happened to line up the right relation wouldn't be locked and nothing
> would complain. Is there something that keeps these in order
> somewhere?

This is a fair observation -- the Assert checks membership in the
global list rather than per-node correspondence. But node A's rti
can't accidentally pass the Assert by matching an entry recorded for
node B. Each ModifyTable node gets its own partition expansion with
distinct RT entries. In a writable CTE like:

WITH upd1 AS (UPDATE t SET ...),
upd2 AS (UPDATE t SET ...)
UPDATE t SET ...

each UPDATE creates a separate set of leaf partition RT entries --
upd1 might get RT indexes 5,6,7, upd2 gets 8,9,10, and the main UPDATE
gets 11,12,13. The global firstResultRels list would be [5, 8, 11].
When ExecInitModifyTable falls back to linitial_int(resultRelations)
for a given node, it finds that node's own entry, because the RT index
sets are disjoint across nodes.

That said, it's worth being explicit about what protections exist at
each layer, since this is safety-critical code:

1. AcquireExecutorLocksPrepared(), added by 0004, locks every entry in
firstResultRels unconditionally. So regardless of which rti a
ModifyTable node falls back to, the relation will be locked.

2. ExecGetRangeTableRelation() has two checks when opening a relation.
For non-result relations (isResultRel=false), it checks
es_unpruned_relids and raises an ERROR in release builds if the
relation was pruned. For result relations (isResultRel=true), that
check is intentionally skipped -- it has to be, because at least one
result relation per ModifyTable node must remain openable even when
all partitions are pruned, since executor code paths like ExecMerge()
and ExecInitPartitionInfo() rely on resultRelInfo[0] being initialized
(see commit 28317de723b). The remaining protection for result
relations is Assert(CheckRelationLockedByMe()) inside table_open,
which fires in debug builds.

3. I've tightened ExecInitModifyTable to close this gap: the
all-pruned fallback path now raises an elog(ERROR) in release builds
if linitial_int(resultRelations) is not found in firstResultRels,
rather than just an Assert. This gives result relations a
production-visible check comparable to what es_unpruned_relids
provides for scan relations.

So the net effect is that for scan relations, opening a
pruned-and-unlocked relation is caught by an ERROR in production via
es_unpruned_relids. For result relations on the all-pruned fallback
path, it's now also caught by an ERROR in production via the
firstResultRels check in ExecInitModifyTable. The locking in
AcquireExecutorLocksPrepared() ensures the relation is always locked
regardless.

Thanks again for the review. A close look at these aspects by someone
other than me is very useful.

--
Thanks, Amit Langote

Attachment Content-Type Size
v13-0003-Introduce-ExecutorPrep-and-refactor-executor-sta.patch application/octet-stream 8.8 KB
v13-0002-Refactor-executor-s-initial-partition-pruning-se.patch application/octet-stream 7.3 KB
v13-0001-Move-execution-lock-acquisition-out-of-GetCached.patch application/octet-stream 16.2 KB
v13-0004-Use-pruning-aware-locking-for-single-statement-c.patch application/octet-stream 40.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2026-05-29 09:22:58 Re: Proposal: Conflict log history table for Logical Replication
Previous Message Joel Jacobson 2026-05-29 08:54:20 Re: Key joins