Re: generic plans and "initial" pruning

From: Amul Sul <sulamul(at)gmail(dot)com>
To: Amit Langote <amitlangote09(at)gmail(dot)com>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: generic plans and "initial" pruning
Date: 2022-01-06 06:44:33
Message-ID: CAAJ_b95Yf-rY0f0KQC+iWY9xzBZ74VhOgpF-sZ2CsYjLj9-xNw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Dec 31, 2021 at 7:56 AM Amit Langote <amitlangote09(at)gmail(dot)com> wrote:
>
> On Tue, Dec 28, 2021 at 22:12 Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> wrote:
>>
>> On Sat, Dec 25, 2021 at 9:06 AM Amit Langote <amitlangote09(at)gmail(dot)com> wrote:
>> >
>> > Executing generic plans involving partitions is known to become slower
>> > as partition count grows due to a number of bottlenecks, with
>> > AcquireExecutorLocks() showing at the top in profiles.
>> >
>> > Previous attempt at solving that problem was by David Rowley [1],
>> > where he proposed delaying locking of *all* partitions appearing under
>> > an Append/MergeAppend until "initial" pruning is done during the
>> > executor initialization phase. A problem with that approach that he
>> > has described in [2] is that leaving partitions unlocked can lead to
>> > race conditions where the Plan node belonging to a partition can be
>> > invalidated when a concurrent session successfully alters the
>> > partition between AcquireExecutorLocks() saying the plan is okay to
>> > execute and then actually executing it.
>> >
>> > However, using an idea that Robert suggested to me off-list a little
>> > while back, it seems possible to determine the set of partitions that
>> > we can safely skip locking. The idea is to look at the "initial" or
>> > "pre-execution" pruning instructions contained in a given Append or
>> > MergeAppend node when AcquireExecutorLocks() is collecting the
>> > relations to lock and consider relations from only those sub-nodes
>> > that survive performing those instructions. I've attempted
>> > implementing that idea in the attached patch.
>> >
>>
>> In which cases, we will have "pre-execution" pruning instructions that
>> can be used to skip locking partitions? Can you please give a few
>> examples where this approach will be useful?
>
>
> This is mainly to be useful for prepared queries, so something like:
>
> prepare q as select * from partitioned_table where key = $1;
>
> And that too when execute q(…) uses a generic plan. Generic plans are problematic because it must contain nodes for all partitions (without any plan time pruning), which means CheckCachedPlan() has to spend time proportional to the number of partitions to determine that the plan is still usable / has not been invalidated; most of that is AcquireExecutorLocks().
>
> Other bottlenecks, not addressed in this patch, pertain to some executor startup/shutdown subroutines that process the range table of a PlannedStmt in its entirety, whose length is also proportional to the number of partitions when the plan is generic.
>
>> The benchmark is showing good results, indeed.
>
Indeed.

Here are few comments for v1 patch:

+ /* Caller error if we get here without contains_init_steps */
+ Assert(pruneinfo->contains_init_steps);

- prunedata = prunestate->partprunedata[i];
- pprune = &prunedata->partrelprunedata[0];

- /* Perform pruning without using PARAM_EXEC Params */
- find_matching_subplans_recurse(prunedata, pprune, true, &result);
+ if (parentrelids)
+ *parentrelids = NULL;

You got two blank lines after Assert.
--

+ /* Set up EState if not in the executor proper. */
+ if (estate == NULL)
+ {
+ estate = CreateExecutorState();
+ estate->es_param_list_info = params;
+ free_estate = true;
}

... [Skip]

+ if (free_estate)
+ {
+ FreeExecutorState(estate);
+ estate = NULL;
}

I think this work should be left to the caller.
--

/*
* Stuff that follows matches exactly what ExecCreatePartitionPruneState()
* does, except we don't need a PartitionPruneState here, so don't call
* that function.
*
* XXX some refactoring might be good.
*/

+1, while doing it would be nice if foreach_current_index() is used
instead of the i & j sequence in the respective foreach() block, IMO.
--

+ while ((i = bms_next_member(validsubplans, i)) >= 0)
+ {
+ Plan *subplan = list_nth(subplans, i);
+
+ context->relations =
+ bms_add_members(context->relations,
+ get_plan_scanrelids(subplan));
+ }

I think instead of get_plan_scanrelids() the
GetLockableRelations_worker() can be used; if so, then no need to add
get_plan_scanrelids() function.
--

/* Nodes containing prunable subnodes. */
+ case T_MergeAppend:
+ {
+ PlannedStmt *plannedstmt = context->plannedstmt;
+ List *rtable = plannedstmt->rtable;
+ ParamListInfo params = context->params;
+ PartitionPruneInfo *pruneinfo;
+ Bitmapset *validsubplans;
+ Bitmapset *parentrelids;

...
if (pruneinfo && pruneinfo->contains_init_steps)
{
int i;
...
return false;
}
}
break;

Most of the declarations need to be moved inside the if-block.

Also, initially, I was a bit concerned regarding this code block
inside GetLockableRelations_worker(), what if (pruneinfo &&
pruneinfo->contains_init_steps) evaluated to false? After debugging I
realized that plan_tree_walker() will do the needful -- a bit of
comment would have helped.
--

+ case T_CustomScan:
+ foreach(lc, ((CustomScan *) plan)->custom_plans)
+ {
+ if (walker((Plan *) lfirst(lc), context))
+ return true;
+ }
+ break;

Why not plan_walk_members() call like other nodes?

Regards,
Amul

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message SATYANARAYANA NARLAPURAM 2022-01-06 07:00:10 Re: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes
Previous Message Amit Langote 2022-01-06 06:43:59 Re: sqlsmith: ERROR: XX000: bogus varno: 2