Re: generic plans and "initial" pruning

From: Thom Brown <thom(at)linux(dot)com>
To: Amit Langote <amitlangote09(at)gmail(dot)com>
Cc: Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Tender Wang <tndrwang(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, Tomas Vondra <tomas(at)vondra(dot)me>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Daniel Gustafsson <daniel(at)yesql(dot)se>, David Rowley <dgrowleyml(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: generic plans and "initial" pruning
Date: 2026-05-29 10:30:29
Message-ID: CAA-aLv6NiPNMzeOv4g_bZdyzSvBHZLqabKv1vgrf8CmJHRqJ-w@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 29 May 2026 at 09:57, Amit Langote <amitlangote09(at)gmail(dot)com> wrote:
>
> On Thu, May 28, 2026 at 10:14 PM Thom Brown <thom(at)linux(dot)com> wrote:
> > On Thu, 28 May 2026 at 09:14, Amit Langote <amitlangote09(at)gmail(dot)com> wrote:
> > > It's a real bug.
> > >
> > > You're right that if PortalLockCachedPlan() replans, the QueryDesc
> > > created before the call still points at the old PlannedStmt from the
> > > released plan. And yes, 0004 happens to fix it by rebuilding the
> > > QueryDesc inside PortalLockCachedPlan(), but 0001 through 0003 are
> > > broken on their own.
> > >
> > > Attached is an updated set with the fix: CreateQueryDesc now runs
> > > after PortalLockCachedPlan() returns, as you suggested. That said,
> > > I'll probably focus first on settling the plancache refactoring that
> > > spun off from this thread [1], and then start a new thread for the
> > > pruning-aware locking work on top of it, incorporating parts of this
> > > series.
> >
> > Thanks.
> >
> > I've done another pass. I see a reference to
> > AcquireExecutorLocksUnpruned(), but I can't find this function. Is
> > this supposed to be AcquireExecutorLocksPrepared()?
>
> You're right, stale comment. It should say
> AcquireExecutorLocksPrepared(). Fixed.
>
> > And also I have a question about the new firstResultRels code
> >
> > If I've followed it right, the bit in setrefs.c records the
> > lowest-numbered RT index from leaf_result_relids as the
> > per-ModifyTable fallback that's used when all real targets get pruned
> > away, and the executor side looks it up via
> > linitial_int(node->resultRelations). For that to work those two have
> > to pick the same RT index, and the comment justifies it with
> > "partition expansion preserves RT index order". Where is that
> > preservation guaranteed?
>
> The ordering comes from expand_inherited_rtentry(), which adds child
> partitions to the range table sequentially in partition bound order.
> Since ModifyTable.resultRelations is built from the same expansion,
> its first element is the lowest-numbered RT index among the leaf
> partitions for that node. That is the same value
> bms_next_member(leaf_result_relids, -1) returns from the Bitmapset,
> because Bitmapset iteration returns members in ascending order. I've
> added a comment in setrefs.c pointing to expand_inherited_rtentry() as
> the source of this guarantee.
>
> > And with the assertion in ExecInitModifyTable:
> >
> > Assert(list_member_int(estate->es_plannedstmt->firstResultRels, rti));
> >
> > With writable CTEs producing more than one ModifyTable node the list
> > has several entries, so all the assert really checks is that some
> > recorded entry matches, not that the one recorded for this particular
> > node matches. If that's correct, then in a case where the wrong entry
> > happened to line up the right relation wouldn't be locked and nothing
> > would complain. Is there something that keeps these in order
> > somewhere?
>
> This is a fair observation -- the Assert checks membership in the
> global list rather than per-node correspondence. But node A's rti
> can't accidentally pass the Assert by matching an entry recorded for
> node B. Each ModifyTable node gets its own partition expansion with
> distinct RT entries. In a writable CTE like:
>
> WITH upd1 AS (UPDATE t SET ...),
> upd2 AS (UPDATE t SET ...)
> UPDATE t SET ...
>
> each UPDATE creates a separate set of leaf partition RT entries --
> upd1 might get RT indexes 5,6,7, upd2 gets 8,9,10, and the main UPDATE
> gets 11,12,13. The global firstResultRels list would be [5, 8, 11].
> When ExecInitModifyTable falls back to linitial_int(resultRelations)
> for a given node, it finds that node's own entry, because the RT index
> sets are disjoint across nodes.
>
> That said, it's worth being explicit about what protections exist at
> each layer, since this is safety-critical code:
>
> 1. AcquireExecutorLocksPrepared(), added by 0004, locks every entry in
> firstResultRels unconditionally. So regardless of which rti a
> ModifyTable node falls back to, the relation will be locked.
>
> 2. ExecGetRangeTableRelation() has two checks when opening a relation.
> For non-result relations (isResultRel=false), it checks
> es_unpruned_relids and raises an ERROR in release builds if the
> relation was pruned. For result relations (isResultRel=true), that
> check is intentionally skipped -- it has to be, because at least one
> result relation per ModifyTable node must remain openable even when
> all partitions are pruned, since executor code paths like ExecMerge()
> and ExecInitPartitionInfo() rely on resultRelInfo[0] being initialized
> (see commit 28317de723b). The remaining protection for result
> relations is Assert(CheckRelationLockedByMe()) inside table_open,
> which fires in debug builds.
>
> 3. I've tightened ExecInitModifyTable to close this gap: the
> all-pruned fallback path now raises an elog(ERROR) in release builds
> if linitial_int(resultRelations) is not found in firstResultRels,
> rather than just an Assert. This gives result relations a
> production-visible check comparable to what es_unpruned_relids
> provides for scan relations.
>
> So the net effect is that for scan relations, opening a
> pruned-and-unlocked relation is caught by an ERROR in production via
> es_unpruned_relids. For result relations on the all-pruned fallback
> path, it's now also caught by an ERROR in production via the
> firstResultRels check in ExecInitModifyTable. The locking in
> AcquireExecutorLocksPrepared() ensures the relation is always locked
> regardless.
>
> Thanks again for the review. A close look at these aspects by someone
> other than me is very useful.

Ah, the disjoint RT-entries point is what I was missing. I'd been
reading firstResultRels as a flat list where in theory any entry could
line up with any node's lookup, which is what made the assert feel
potentially insufficient. If each ModifyTable's expansion produces its
own non-overlapping set of leaf RT indexes then membership in the
global list really is equivalent to membership in this node's own
entry, and the assert is sufficient as it stands. Walking through the
writable-CTE case helped.

Thanks

Thom

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Álvaro Herrera 2026-05-29 10:43:03 Re: [PATCH] Improve REPACK (CONCURRENTLY) error messages for unsupported configurations
Previous Message Nazir Bilal Yavuz 2026-05-29 09:51:29 Re: Heads Up: cirrus-ci is shutting down June 1st