| From: | Amit Langote <amitlangote09(at)gmail(dot)com> |
|---|---|
| To: | Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> |
| Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Tender Wang <tndrwang(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, Tomas Vondra <tomas(at)vondra(dot)me>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Daniel Gustafsson <daniel(at)yesql(dot)se>, David Rowley <dgrowleyml(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Thom Brown <thom(at)linux(dot)com> |
| Subject: | Re: generic plans and "initial" pruning |
| Date: | 2026-03-25 07:39:47 |
| Message-ID: | CA+HiwqFx0kmGqSDcLrE37KkHS2T9O1NoBitZT4mA4yJBBt_QjA@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Fri, Mar 20, 2026 at 2:20 AM Amit Langote <amitlangote09(at)gmail(dot)com> wrote:
> On Mon, Mar 9, 2026 at 1:41 PM Amit Langote <amitlangote09(at)gmail(dot)com> wrote:
> > On Sat, Mar 7, 2026 at 6:54 PM Amit Langote <amitlangote09(at)gmail(dot)com> wrote:
> > > Attached is v6 of the patch series. I've been working toward
> > > committing this, so I wanted to lay out the ExecutorPrep() design and
> > > the key trade-offs before doing so.
> > >
> > > When a cached generic plan references a partitioned table,
> > > GetCachedPlan() locks all partitions upfront via
> > > AcquireExecutorLocks(), even those that initial pruning will
> > > eliminate. But initial partition pruning only runs later during
> > > ExecutorStart(). Moving pruning earlier requires some executor setup
> > > (range table, permissions, pruning state), and ExecutorPrep() is the
> > > vehicle for that. Unlike the approach reverted in last May, this
> > > keeps the CachedPlan itself unchanged -- all per-execution state flows
> > > through a separate CachedPlanPrepData that the caller provides.
> > >
> > > The approach also keeps GetCachedPlan()'s interface
> > > backward-compatible: the new CachedPlanPrepData argument is optional.
> > > If a caller passes NULL, all partitions are locked as before and
> > > nothing changes. This means existing callers and any new code that
> > > calls GetCachedPlan() without caring about pruning-aware locking just
> > > works.
> > >
> > > The risk is on the other side: if a caller does pass a
> > > CachedPlanPrepData, GetCachedPlan() will lock only the surviving
> > > partitions and populate prep_estates with the EStates that
> > > ExecutorPrep() created. The caller then must make those EStates
> > > available to ExecutorStart() -- via QueryDesc->estate,
> > > portal->prep_estates, or the equivalent path for SPI and SQL
> > > functions. If it fails to do so, ExecutorStart() will call
> > > ExecutorPrep() again, which may compute different pruning results than
> > > the original call, potentially expecting locks on relations that were
> > > never acquired. The executor would then operate on relations it
> > > doesn't hold locks on.
> > >
> > > So the contract is: if you opt in to pruning-aware locking by passing
> > > CachedPlanPrepData, you must complete the pipeline by delivering the
> > > prep EStates to the executor. In the current patch, all the call sites
> > > that pass a CachedPlanPrepData (portals, SPI, EXECUTE, SQL functions,
> > > EXPLAIN) do thread the EStates through correctly, and I've tried to
> > > make the plumbing straightforward enough that it's hard to get wrong.
> > > But it is a new invariant that didn't exist before, and a caller that
> > > gets it wrong would fail silently rather than with an obvious error.
> > >
> > > To catch such violations, I've added a debug-only check in
> > > standard_ExecutorStart() that fires when no prep EState was provided.
> > > It iterates over the plan's rtable and verifies that every lockable
> > > relation is actually locked. It should always be true if
> > > AcquireExecutorLocks() locked everything, but would fail if
> > > pruning-aware locking happened upstream and the caller dropped the
> > > prep EState. The check is skipped in parallel workers, which acquire
> > > relation locks lazily in ExecGetRangeTableRelation().
> > >
> > > + if (queryDesc->estate == NULL)
> > > + {
> > > +#ifdef USE_ASSERT_CHECKING
> > > + if (!IsParallelWorker())
> > > + {
> > > + ListCell *lc;
> > > +
> > > + foreach(lc, queryDesc->plannedstmt->rtable)
> > > + {
> > > + RangeTblEntry *rte = lfirst_node(RangeTblEntry, lc);
> > > +
> > > + if (rte->rtekind == RTE_RELATION ||
> > > + (rte->rtekind == RTE_SUBQUERY && rte->relid != InvalidOid))
> > > + Assert(CheckRelationOidLockedByMe(rte->relid,
> > > + rte->rellockmode,
> > > + true));
> > > + }
> > > + }
> > > +#endif
> > > + queryDesc->estate = ExecutorPrep(queryDesc->plannedstmt,
> > > + queryDesc->params,
> > > + CurrentResourceOwner,
> > > + true,
> > > + eflags);
> > > + }
> > > +#ifdef USE_ASSERT_CHECKING
> > > + else
> > > + {
> > > + /*
> > > + * A prep EState was provided, meaning pruning-aware locking
> > > + * should have locked at least the unpruned relations.
> > > + */
> > > + if (!IsParallelWorker())
> > > + {
> > > + int rtindex = -1;
> > > +
> > > + while ((rtindex =
> > > bms_next_member(queryDesc->estate->es_unpruned_relids,
> > > + rtindex)) >= 0)
> > > + {
> > > + RangeTblEntry *rte = exec_rt_fetch(rtindex, queryDesc->estate);
> > > +
> > > + Assert(rte->rtekind == RTE_RELATION ||
> > > + (rte->rtekind == RTE_SUBQUERY &&
> > > + rte->relid != InvalidOid));
> > > + Assert(CheckRelationOidLockedByMe(rte->relid,
> > > + rte->rellockmode, true));
> > > + }
> > > + }
> > > + }
> > > +#endif
> > >
> > > So the invariant is: if no prep EState was provided, every relation in
> > > the plan is locked; if one was provided, at least the unpruned
> > > relations are locked. Both are checked in assert builds.
> > >
> > > I think this covers the main concerns, but I may be missing something.
> > > If anyone sees a problem with this approach, I'd like to hear about
> > > it.
> >
> > Here's v7. Some plancache.c changes that I'd made were in the wrong
> > patch in v6; this version puts them where they belong.
>
> Attached is an updated set. One more fix: I added an Assert in
> SPI_cursor_open_internal()'s !plan->saved path to verify that
> prep_estates is NIL. Unsaved plans always take the custom plan path,
> so pruning-aware locking never applies, but it's worth guarding
> explicitly since the copyObject/ReleaseCachedPlan sequence that
> follows would not be safe otherwise. Also changed
> SPI_plan_get_cached_plan() to pass NULL for cprep, since it only
> returns the CachedPlan pointer and has no way to deliver prep_estates
> to anyone.
>
> Stepping back -- the core question is whether running executor logic
> (pruning) inside GetCachedPlan() is acceptable at all. The plan cache
> and executor have always had a clean boundary: plan cache locks
> everything, executor runs. This optimization necessarily crosses that
> line, because the information needed to decide which locks to skip
> (pruning results) can only come from executor machinery.
>
> The proposed approach has GetCachedPlan() call ExecutorPrep() to do a
> limited subset of executor work (range table init, permissions,
> pruning), carry the results out through CachedPlanPrepData, and leave
> the CachedPlan itself untouched. The executor already has a multi-step
> protocol: start/run/end. prep/start/run/end is just a finer
> decomposition of what InitPlan() was already doing inside
> ExecutorStart().
>
> Of the attached patches, I'm targeting 0001-0003 for commit. 0004 (SQL
> function support) and 0005 (parallel worker reuse) are useful
> follow-ons but not essential. The optimization works without them for
> most cases, and they can be reviewed and committed separately.
>
> If there's a cleaner way to avoid locking pruned partitions without
> the plumbing this patch adds, I haven't found it in the year since the
> revert. I'd welcome a pointer if you see one. Failing that, I think
> this is the right trade-off, but it's a judgment call about where to
> hold your nose.
>
> Tom, I'd value your opinion on whether this approach is something
> you'd be comfortable seeing in the tree.
Attached is an updated set with some cleanup after another pass.
- Removed ExecCreatePartitionPruneStates() from 0001. In 0001-0003,
ExecDoInitialPruning() handles both setup and pruning internally; the
split isn't needed yet.
- Tightened commit messages to describe what each commit does now, not
what later commits will use it for. In particular, 0002 is upfront
that the portal/SPI/EXPLAIN plumbing is scaffolding that 0003 lights
up.
- Updated setrefs.c comment for firstResultRels to drop a blanket
claim about one ModifyTable per query level.
As before, 0001-0003 is the focus, maybe 0004 which teaches the new
GetCachedPlan() pruning-aware contract to its relatively new user in
function.c.
--
Thanks, Amit Langote
| Attachment | Content-Type | Size |
|---|---|---|
| v9-0004-Make-SQL-function-executor-track-ExecutorPrep-sta.patch | application/octet-stream | 7.8 KB |
| v9-0005-Reuse-partition-pruning-results-in-parallel-worke.patch | application/octet-stream | 15.8 KB |
| v9-0003-Use-pruning-aware-locking-in-cached-plans.patch | application/octet-stream | 41.8 KB |
| v9-0001-Refactor-executor-s-initial-partition-pruning-set.patch | application/octet-stream | 7.3 KB |
| v9-0002-Introduce-ExecutorPrep-and-refactor-executor-star.patch | application/octet-stream | 25.5 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Richard Guo | 2026-03-25 07:43:11 | Re: Convert ALL SubLinks to ANY SubLinks |
| Previous Message | Lukas Fittl | 2026-03-25 07:29:20 | Re: pg_buffercache: Add per-relation summary stats |