Re: Parallel INSERT (INTO ... SELECT ...)

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Greg Nancarrow <gregn4422(at)gmail(dot)com>
Cc: Amit Langote <amitlangote09(at)gmail(dot)com>, "Hou, Zhijie" <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, vignesh C <vignesh21(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)fujitsu(dot)com>, "Tang, Haiying" <tanghy(dot)fnst(at)cn(dot)fujitsu(dot)com>
Subject: Re: Parallel INSERT (INTO ... SELECT ...)
Date: 2021-03-03 09:46:17
Message-ID: CAA4eK1LXHX3a+GrBTFcu6P5iw8mpQCDmT+wkWUH5FRf6xj4TmQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 3, 2021 at 12:52 PM Greg Nancarrow <gregn4422(at)gmail(dot)com> wrote:
>
> On Tue, Mar 2, 2021 at 11:19 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Mon, Mar 1, 2021 at 9:08 AM Greg Nancarrow <gregn4422(at)gmail(dot)com> wrote:
>
> > 2.
> > /*
> > + * Prepare for entering parallel mode by assigning a
> > + * FullTransactionId, to be included in the transaction state that is
> > + * serialized in the parallel DSM.
> > + */
> > + (void) GetCurrentTransactionId();
> > + }
> >
> > Similar to the previous comment this also seems to indicate that we
> > require TransactionId for workers. If that is not the case then this
> > comment also needs to be modified.
> >
>
> I'll update to comment to remove the part about the serialization (as
> this always happens, not a function of the patch) and mention it is
> needed to avoid attempting to assign a FullTransactionId in
> parallel-mode.
>

Okay, but just use TransactionId in comments if there is a need, we
still don't use FullTransactionId for the heap.

> > 4.
> > * Assess whether it's feasible to use parallel mode for this query. We
> > * can't do this in a standalone backend, or if the command will try to
> > - * modify any data, or if this is a cursor operation, or if GUCs are set
> > - * to values that don't permit parallelism, or if parallel-unsafe
> > - * functions are present in the query tree.
> > + * modify any data using a CTE, or if this is a cursor operation, or if
> > + * GUCs are set to values that don't permit parallelism, or if
> > + * parallel-unsafe functions are present in the query tree.
> >
> > This comment change is not required because this is quite similar to
> > what we do for CTAS. Your further comment changes in this context are
> > sufficient.
>
> An INSERT modifies data, so according to the original comment, then
> it's not feasible to use parallel mode, because the command tries to
> modify data ("We can't do this in a standalone backend, or if the
> command will try to modify any data ...").
> Except now we need to use parallel-mode for "INSERT with parallel
> SELECT", and INSERT is a command that modifies data.
> So isn't the comment change actually needed?
>

If we want to change, we can say "if the command will try to modify
any data except for inserts ..." or something like that but saying
only about CTE is not correct because then what about updates and
deletes.

>
> >
> > 5.
> > + (IsModifySupportedInParallelMode(parse->commandType) &&
> > + is_parallel_possible_for_modify(parse))) &&
> >
> > I think it would be better if we move the check
> > IsModifySupportedInParallelMode inside
> > is_parallel_possible_for_modify.
>
> The reason it is done outside of is_parallel_possible_for_modify() is
> to avoid the function overhead of calling
> is_parallel_possible_for_modify() for SELECT statements, only to
> return without doing anything. Note also that
> IsModifySupportedInParallelMode() is an inline function.
>

I don't see any reason to be worried about that here. It is more
important for code and checks to look simpler.

> >Also, it might be better to name this
> > function as is_parallel_allowed_for_modify.
>
> I do tend to think that in this case "possible" is better than "allowed".
> Only the "parallel_dml" GUC test is checking for something that is "allowed".
> The other two checks are checking for things that determine whether
> parallel-mode is even "possible".
>

I think I don't like this proposed name for the function. See, if you
can think of something better.

> >
> > 6.
> > @@ -260,6 +265,21 @@ set_plan_references(PlannerInfo *root, Plan *plan)
> > */
> > add_rtes_to_flat_rtable(root, false);
> >
> > + /*
> > + * If modifying a partitioned table, add its parallel-safety-checked
> > + * partitions too to glob->relationOids, to register them as plan
> > + * dependencies. This is only really needed in the case of a parallel
> > + * plan, so that if parallel-unsafe properties are subsequently defined
> > + * on the partitions, the cached parallel plan will be invalidated and
> > + * a non-parallel plan will be generated.
> > + */
> > + if (IsModifySupportedInParallelMode(root->parse->commandType))
> > + {
> > + if (glob->partitionOids != NIL && glob->parallelModeNeeded)
> > + glob->relationOids =
> > + list_concat(glob->relationOids, glob->partitionOids);
> > + }
> > +
> >
> > Isn't it possible to add these partitionOids in set_plan_refs with the
> > T_Gather(Merge) node handling? That looks like a more natural place to
> > me, if that is feasible then we don't need parallelModeNeeded check
> > and maybe we don't need to even check IsModifySupportedInParallelMode
> > but I am less sure of the second check requirement.
> >
>
> There may be multiple Gather/GatherMerge nodes in the plan (when they
> are not top-level nodes), and I think by moving this code into
> set_plan_refs() you risk adding those partitionOids multiple times to
> glob->relationOids, when the Gather/GatherMerge nodes are traversed
> (note that set_plan_refs() is called from set_plan_references() and is
> recursive).
> Leaving the code where it is is set_plan_references() guarantees that
> the partitionOids can only be added ONCE.
>

Okay, is there a reason to use IsModifySupportedInParallelMode? Isn't
the second check sufficient?

--
With Regards,
Amit Kapila.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Julien Rouhaud 2021-03-03 09:52:10 Re: Increase value of OUTER_VAR
Previous Message Magnus Hagander 2021-03-03 09:39:55 Re: PROXY protocol support