Re: generic plans and "initial" pruning

From: Amit Langote <amitlangote09(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: generic plans and "initial" pruning
Date: 2022-04-08 05:49:39
Message-ID: CA+HiwqE_AV1PAyVeHez89zo7CgH=4dSMBOKi0Rgxt81xEqH9QQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 7, 2022 at 9:41 PM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> On Thu, 7 Apr 2022 at 20:28, Amit Langote <amitlangote09(at)gmail(dot)com> wrote:
> > Here's an updated version. In Particular, I removed
> > part_prune_results list from PortalData, in favor of anything that
> > needs to look at the list can instead get it from the CachedPlan
> > (PortalData.cplan). This makes things better in 2 ways:
>
> Thanks for making those changes.
>
> I'm not overly familiar with the data structures we use for planning
> around plans between the planner and executor, but storing the pruning
> results in CachedPlan seems pretty bad. I see you've stashed it in
> there and invented a new memory context to stop leaks into the cache
> memory.
>
> Since I'm not overly familiar with these structures, I'm trying to
> imagine why you made that choice and the best I can come up with was
> that it was the most convenient thing you had to hand inside
> CheckCachedPlan().

Yeah, it's that way because it felt convenient, though I have wondered
if a simpler scheme that doesn't require any changes to the CachedPlan
data structure might be better after all. Your pointing it out has
made me think a bit harder on that.

> I don't really have any great ideas right now on how to make this
> better. I wonder if GetCachedPlan() should be changed to return some
> struct that wraps up the CachedPlan with some sort of executor prep
> info struct that we can stash the list of PartitionPruneResults in,
> and perhaps something else one day.

I think what might be better to do now is just add an output List
parameter to GetCachedPlan() to add the PartitionPruneResult node to
instead of stashing them into CachedPlan as now. IMHO, we should
leave inventing a new generic struct to the next project that will
make it necessary to return more information from GetCachedPlan() to
its users. I find it hard to convincingly describe what the new
generic struct really is if we invent it *now*, when it's going to
carry a single list whose purpose is pretty narrow.

So, I've implemented this by making the callers of GetCachedPlan()
pass a list to add the PartitionPruneResults that may be produced.
Most callers can put that into the Portal for passing that to other
modules, so I have reinstated PortalData.part_prune_results. As for
its memory management, the list and the PartitionPruneResults therein
will be allocated in a context that holds the Portal itself.

> Some lesser important stuff that I think could be done better.
>
> * Are you also able to put meaningful comments on the
> PartitionPruneResult struct in execnodes.h?
>
> * In create_append_plan() and create_merge_append_plan() you have the
> same code to set the part_prune_index. Why not just move all that code
> into make_partition_pruneinfo() and have make_partition_pruneinfo()
> return the index and append to the PlannerInfo.partPruneInfos List?

That sounds better, so done.

> * Why not forboth() here?
>
> i = 0;
> foreach(stmtlist_item, portal->stmts)
> {
> PlannedStmt *pstmt = lfirst_node(PlannedStmt, stmtlist_item);
> PartitionPruneResult *part_prune_result = part_prune_results ?
> list_nth(part_prune_results, i) :
> NULL;
>
> i++;

Because the PartitionPruneResult list may not always be available. To
wit, it's only available when it is GetCachedPlan() that gave the
portal its plan. I know this is a bit ugly, but it seems better than
fixing all users of Portal to build a dummy list, not that it is
totally avoidable even in the current implementation.

> * It would be good if ReleaseExecutorLocks() already knew the RTIs
> that were locked. Maybe the executor prep info struct I mentioned
> above could also store the RTIs that have been locked already and
> allow ReleaseExecutorLocks() to just iterate over those to release the
> locks.

Rewrote this such that ReleaseExecutorLocks() just receives a list of
per-PlannedStmt bitmapsets containing the RT indexes of only the
locked entries in that plan.

Attached updated patch with these changes.

--
Amit Langote
EDB: http://www.enterprisedb.com

Attachment Content-Type Size
v13-0001-Optimize-AcquireExecutorLocks-to-skip-pruned-par.patch application/octet-stream 99.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2022-04-08 05:53:38 Re: PG DOCS - logical replication filtering
Previous Message Masahiko Sawada 2022-04-08 05:19:39 Re: Showing I/O timings spent reading/writing temp buffers in EXPLAIN