Re: Parallel INSERT (INTO ... SELECT ...)

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Antonin Houska <ah(at)cybertec(dot)at>
Cc: Greg Nancarrow <gregn4422(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel INSERT (INTO ... SELECT ...)
Date: 2021-01-08 09:18:44
Message-ID: CAA4eK1JMpPZcWknaa1-afbJ=bof3JmV1HWHbVRkThbkwah4_Dg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 6, 2021 at 2:09 PM Antonin Houska <ah(at)cybertec(dot)at> wrote:
>
> Greg Nancarrow <gregn4422(at)gmail(dot)com> wrote:
>
> > Posting an updated set of patches to address recent feedback:
>
> Following is my review.
>
..
>
> v11-0003-Enable-parallel-INSERT-and-or-SELECT-for-INSERT-INTO.patch
> -------------------------------------------------------------------
>
> @@ -1021,12 +1039,15 @@ IsInParallelMode(void)
> * Prepare for entering parallel mode, based on command-type.
> */
> void
> -PrepareParallelMode(CmdType commandType)
> +PrepareParallelMode(CmdType commandType, bool isParallelModifyLeader)
> {
> Assert(!IsInParallelMode() || force_parallel_mode != FORCE_PARALLEL_OFF);
>
> if (IsModifySupportedInParallelMode(commandType))
> {
> + if (isParallelModifyLeader)
> + (void) GetCurrentCommandId(true);
>
> I miss a comment here. I suppose this is to set currentCommandIdUsed, so that
> the leader process gets a new commandId for the following statements in the
> same transaction, and thus it can see the rows inserted by the parallel
> workers?
>

oh no, leader backend and worker backends must use the same commandId.
I am also not sure if we need this because for Insert statements we
already call GetCurrentCommandId(true) is standard_ExecutorStart. We
don't want the rows visibility behavior for parallel-inserts any
different than non-parallel ones.

> If my understanding is correct, I think that the leader should not participate
> in the execution of the Insert node, else it would use higher commandId than
> the workers. That would be weird, although probably not data corruption.
>

Yeah, exactly this is the reason both leader and backends must use the
same commandId.

> I
> wonder if parallel_leader_participation should be considered false for the
> "Gather -> Insert -> ..." plans.
>

If what I said above is correct then this is moot.

>
>
> @@ -208,7 +236,7 @@ ExecGather(PlanState *pstate)
> }
>
> /* Run plan locally if no workers or enabled and not single-copy. */
> - node->need_to_scan_locally = (node->nreaders == 0)
> + node->need_to_scan_locally = (node->nworkers_launched <= 0)
> || (!gather->single_copy && parallel_leader_participation);
> node->initialized = true;
> }
>
> Is this change needed? The code just before this test indicates that nreaders
> should be equal to nworkers_launched.
>

This change is required because we don't need to set up readers for
parallel-insert unless there is a returning clause. See the below
check a few lines before this change:

- if (pcxt->nworkers_launched > 0)
+ if (pcxt->nworkers_launched > 0 && !(isParallelModifyLeader &&
!isParallelModifyWithReturning))
{

I think this check could be simplified to if (pcxt->nworkers_launched
> 0 && isParallelModifyWithReturning) or something like that.

>
> In grouping_planner(), this branch
>
> + /* Consider a supported parallel table-modification command */
> + if (IsModifySupportedInParallelMode(parse->commandType) &&
> + !inheritance_update &&
> + final_rel->consider_parallel &&
> + parse->rowMarks == NIL)
> + {
>
> is very similar to creation of the non-parallel ModifyTablePaths - perhaps an
> opportunity to move the common code into a new function.
>

+1.

>
> @@ -2401,6 +2494,13 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
> }
> }
>
> + if (parallel_modify_partial_path_count > 0)
> + {
> + final_rel->rows = current_rel->rows; /* ??? why hasn't this been
> + * set above somewhere ???? */
> + generate_useful_gather_paths(root, final_rel, false);
> + }
> +
> extra.limit_needed = limit_needed(parse);
> extra.limit_tuples = limit_tuples;
> extra.count_est = count_est;
>
> A boolean variable (e.g. have_parallel_modify_paths) would suffice, there's no
> need to count the paths using parallel_modify_partial_path_count.
>

Sounds sensible.

>
> @@ -252,6 +252,7 @@ set_plan_references(PlannerInfo *root, Plan *plan)
> PlannerGlobal *glob = root->glob;
> int rtoffset = list_length(glob->finalrtable);
> ListCell *lc;
> + Plan *finalPlan;
>
> /*
> * Add all the query's RTEs to the flattened rangetable. The live ones
> @@ -302,7 +303,17 @@ set_plan_references(PlannerInfo *root, Plan *plan)
> }
>
> /* Now fix the Plan tree */
> - return set_plan_refs(root, plan, rtoffset);
> + finalPlan = set_plan_refs(root, plan, rtoffset);
> + if (finalPlan != NULL && IsA(finalPlan, Gather))
> + {
> + Plan *subplan = outerPlan(finalPlan);
> +
> + if (IsA(subplan, ModifyTable) && castNode(ModifyTable, subplan)->returningLists != NULL)
> + {
> + finalPlan->targetlist = copyObject(subplan->targetlist);
> + }
> + }
> + return finalPlan;
> }
>
> I'm not sure if the problem of missing targetlist should be handled here (BTW,
> NIL is the constant for an empty list, not NULL). Obviously this is a
> consequence of the fact that the ModifyTable node has no regular targetlist.
>

I think it is better to add comments along with this change. In this
form, this looks quite hacky to me.

> Actually I don't quite understand why (in the current master branch) the
> targetlist initialized in set_plan_refs()
>
> /*
> * Set up the visible plan targetlist as being the same as
> * the first RETURNING list. This is for the use of
> * EXPLAIN; the executor won't pay any attention to the
> * targetlist. We postpone this step until here so that
> * we don't have to do set_returning_clause_references()
> * twice on identical targetlists.
> */
> splan->plan.targetlist = copyObject(linitial(newRL));
>
> is not used. Instead, ExecInitModifyTable() picks the first returning list
> again:
>
> /*
> * Initialize result tuple slot and assign its rowtype using the first
> * RETURNING list. We assume the rest will look the same.
> */
> mtstate->ps.plan->targetlist = (List *) linitial(node->returningLists);
>
> So if you set the targetlist in create_modifytable_plan() (according to
> best_path->returningLists), or even in create_modifytable_path(), and ensure
> that it gets propagated to the Gather node (generate_gather_pahs currently
> uses rel->reltarget), then you should no longer need to tweak
> setrefs.c.

This sounds worth investigating.

> Moreover, ExecInitModifyTable() would no longer need to set the
> targetlist.
>

I am not sure if we need to do anything about ExecInitModifyTable. If
we want to unify what setrefs.c does with ExecInitModifyTable, then we
can start a separate thread.

Thanks for all the reviews. I would like to emphasize what I said
earlier in this thread that it is better to first focus on
Parallelising Selects for Insert (aka what
v11-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT does) as that
in itself is a step towards achieving parallel inserts, doing both
0001 and 0003 at the same time can take much more time as both touches
quite intricate parts of the code.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2021-01-08 09:21:15 Re: Improper use about DatumGetInt32
Previous Message Peter Smith 2021-01-08 09:11:48 Re: Single transaction in the tablesync worker?