Re: generic plans and "initial" pruning

From: Amit Langote <amitlangote09(at)gmail(dot)com>
To: Daniel Gustafsson <daniel(at)yesql(dot)se>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, David Rowley <dgrowleyml(at)gmail(dot)com>, Jacob Champion <jchampion(at)timescale(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: generic plans and "initial" pruning
Date: 2023-07-13 12:58:38
Message-ID: CA+HiwqEDbf=+s73hAF0PigWORRx+YWwbCQtuuWtHzc3ko_DGpw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jul 6, 2023 at 11:29 PM Amit Langote <amitlangote09(at)gmail(dot)com> wrote:
> On Mon, Jul 3, 2023 at 10:27 PM Daniel Gustafsson <daniel(at)yesql(dot)se> wrote:
> > > On 8 Jun 2023, at 16:23, Amit Langote <amitlangote09(at)gmail(dot)com> wrote:
> > > Here is a new version.
> >
> > The local planstate variable in the hunk below is shadowing the function
> > parameter planstate which cause a compiler warning:
>
> Thanks Daniel for the heads up.
>
> Attached new version fixes that and contains a few other notable
> changes. Before going into the details of those changes, let me
> reiterate in broad strokes what the patch is trying to do.
>
> The idea is to move the locking of some tables referenced in a cached
> (generic) plan from plancache/GetCachedPlan() to the
> executor/ExecutorStart(). Specifically, the locking of inheritance
> child tables. Why? Because partition pruning with "initial pruning
> steps" contained in the Append/MergeAppend nodes may eliminate some
> child tables that need not have been locked to begin with, though the
> pruning can only occur during ExecutorStart().
>
> After applying this patch, GetCachedPlan() only locks the tables that
> are directly mentioned in the query to ensure that the
> analyzed-rewritten-but-unplanned query tree backing a given CachedPlan
> is still valid (cf RevalidateCachedQuery()), but not the tables in the
> CachedPlan that would have been added by the planner. Tables in a
> CachePlan that would not be locked currently only include the
> inheritance child tables / partitions of the tables mentioned in the
> query. This means that the plan trees in a given CachedPlan returned
> by GetCachedPlan() are only partially valid and are subject to
> invalidation because concurrent sessions can possibly modify the child
> tables referenced in them before ExecutorStart() gets around to
> locking them. If the concurrent modifications do happen,
> ExecutorStart() is now equipped to detect them by way of noticing that
> the CachedPlan is invalidated and inform the caller to discard and
> recreate the CachedPlan. This entails changing all the call sites of
> ExecutorStart() that pass it a plan tree from a CachedPlan to
> implement the replan-and-retry-execution loop.
>
> Given the above, ExecutorStart(), which has not needed so far to take
> any locks (except on indexes mentioned in IndexScans), now needs to
> lock child tables if executing a cached plan which contains them. In
> the previous versions, the patch used a flag passed in
> EState.es_top_eflags to signal ExecGetRangeTableRelation() to lock the
> table. The flag would be set in ExecInitAppend() and
> ExecInitMergeAppend() for the duration of the loop that initializes
> child subplans with the assumption that that's where the child tables
> would be opened. But not all child subplans of Append/MergeAppend
> scan child tables (think UNION ALL queries), so this approach can
> result in redundant locking. Worse, I needed to invent
> PlannedStmt.elidedAppendChildRelations to separately track child
> tables whose Scan nodes' parent Append/MergeAppend would be removed by
> setrefs.c in some cases.
>
> So, this new patch uses a flag in the RangeTblEntry itself to denote
> if the table is a child table instead of the above roundabout way.
> ExecGetRangeTableRelation() can simply look at the RTE to decide
> whether to take a lock or not. I considered adding a new bool field,
> but noticed we already have inFromCl to track if a given RTE is for
> table/entity directly mentioned in the query or for something added
> behind-the-scenes into the range table as the field's description in
> parsenodes.h says. RTEs for child tables are added behind-the-scenes
> by the planner and it makes perfect sense to me to mark their inFromCl
> as false. I can't find anything that relies on the current behavior
> of inFromCl being set to the same value as the root inheritance parent
> (true). Patch 0002 makes this change for child RTEs.
>
> A few other notes:
>
> * A parallel worker does ExecutorStart() without access to the
> CachedPlan that the leader may have gotten its plan tree from. This
> means that parallel workers do not have the ability to detect plan
> tree invalidations. I think that's fine, because if the leader would
> have been able to launch workers at all, it would also have gotten all
> the locks to protect the (portion of) the plan tree that the workers
> would be executing. I had an off-list discussion about this with
> Robert and he mentioned his concern that each parallel worker would
> have its own view of which child subplans of a parallel Append are
> "valid" that depends on the result of its own evaluation of initial
> pruning. So, there may be race conditions whereby a worker may try
> to execute plan nodes that are no longer valid, for example, if the
> partition a worker considers valid is not viewed as such by the leader
> and thus not locked. I shared my thoughts as to why that sounds
> unlikely at [1], though maybe I'm a bit too optimistic?
>
> * For multi-query portals, you can't now do ExecutorStart()
> immediately followed by ExecutorRun() for each query in the portal,
> because ExecutorStart() may now fail to start a plan if it gets
> invalidated. So PortalStart() now does ExecutorStart()s for all
> queries and remembers the QueryDescs for PortalRun() then to do
> ExecutorRun()s using. A consequence of this is that
> CommandCounterIncrement() now must be done between the
> ExecutorStart()s of the individual plans in PortalStart() and not
> between the ExecutorRun()s in PortalRunMulti(). make check-world
> passes with this new arrangement, though I'm not entirely confident
> that there are no problems lurking.

In an absolutely brown-paper-bag moment, I realized that I had not
updated src/backend/executor/README to reflect the changes to the
executor's control flow that this patch makes. That is, after
scrapping the old design back in January whose details *were*
reflected in the patches before that redesign.

Anyway, the attached fixes that.

Tom, do you think you have bandwidth in the near future to give this
another look? I think I've addressed the comments that you had given
back in April, though as mentioned in the previous message, there may
still be some funny-looking aspects still remaining. In any case, I
have no intention of pressing ahead with the patch without another
committer having had a chance to sign off on it.

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

Attachment Content-Type Size
v41-0001-Add-field-to-store-parent-relids-to-Append-Merge.patch application/x-patch 21.2 KB
v41-0004-Track-opened-range-table-relations-in-a-List-in-.patch application/x-patch 2.4 KB
v41-0002-Set-inFromCl-to-false-in-child-table-RTEs.patch application/x-patch 3.7 KB
v41-0003-Delay-locking-of-child-tables-in-cached-plans-un.patch application/x-patch 128.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2023-07-13 13:06:49 Re: doc: clarify the limitation for logical replication when REPILICA IDENTITY is FULL
Previous Message Matthias van de Meent 2023-07-13 12:02:36 Potential memory leak in contrib/intarray's g_intbig_compress