Re: Parallel INSERT (INTO ... SELECT ...)

From: Greg Nancarrow <gregn4422(at)gmail(dot)com>
To: "Hou, Zhijie" <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)fujitsu(dot)com>, "Tang, Haiying" <tanghy(dot)fnst(at)cn(dot)fujitsu(dot)com>, Antonin Houska <ah(at)cybertec(dot)at>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Amit Langote <amitlangote09(at)gmail(dot)com>
Subject: Re: Parallel INSERT (INTO ... SELECT ...)
Date: 2021-02-10 23:24:27
Message-ID: CAJcOf-eJzT3bbQhuRqkMoiZJ0g1kRkvDOFawt-MqBNLG_T2oMw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Feb 10, 2021 at 8:59 PM Hou, Zhijie <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com> wrote:
>
> > > >
> > > > else if (IsA(node, Query))
> > > > {
> > > > Query *query = (Query *) node;
> > > >
> > > > /* SELECT FOR UPDATE/SHARE must be treated as unsafe */
> > > > if (query->rowMarks != NULL)
> > > > {
> > > > context->max_hazard = PROPARALLEL_UNSAFE;
> > > > return true;
> > > > }
> > >
> > > In my understanding, the max_parallel_hazard() query tree walk is to
> > > find constructs that are parallel unsafe in that they call code that
> > > can't run in parallel mode. For example, FOR UPDATE/SHARE on
> > > traditional heap AM tuples calls AssignTransactionId() which doesn't
> > > support being called in parallel mode. Likewise ON CONFLICT ... DO
> > > UPDATE calls heap_update() which doesn't support parallelism. I'm not
> > > aware of any such hazards downstream of ExecValuesScan().
> > >
> > > > >You're trying to
> > > > > add something that bails based on second-guessing that a parallel
> > > > >path would not be chosen, which I find somewhat objectionable.
> > > > >
> > > > > If the main goal of bailing out is to avoid doing the potentially
> > > > > expensive modification safety check on the target relation, maybe
> > > > > we should try to somehow make the check less expensive. I
> > > > > remember reading somewhere in the thread about caching the result
> > > > > of this check in relcache, but haven't closely studied the feasibility
> > of doing so.
> > > > >
> > > >
> > > > There's no "second-guessing" involved here.
> > > > There is no underlying way of dividing up the VALUES data of
> > > > "INSERT...VALUES" amongst the parallel workers, even if the planner
> > > > was updated to produce a parallel-plan for the "INSERT...VALUES"
> > > > case (apart from the fact that spawning off parallel workers to
> > > > insert that data would almost always result in worse performance
> > > > than a non-parallel plan...) The division of work for parallel
> > > > workers is part of the table AM
> > > > (scan) implementation, which is not invoked for "INSERT...VALUES".
> > >
> > > I don't disagree that the planner would not normally assign a parallel
> > > path simply to pull values out of a VALUES list mentioned in the
> > > INSERT command, but deciding something based on the certainty of it in
> > > an earlier planning phase seems odd to me. Maybe that's just me
> > > though.
> > >
> >
> > I think it is more of a case where neither is a need for parallelism nor
> > we want to support parallelism of it. The other possibility for such a check
> > could be at some earlier phase say in standard_planner [1] where we are
> > doing checks for other constructs where we don't want parallelism (I think
> > the check for 'parse->hasModifyingCTE' is quite similar). If you see in
> > that check as well we just assume other operations to be in the category
> > of parallel-unsafe. I think we should rule out such cases earlier than later.
> > Do you have better ideas than what Greg has done to avoid parallelism for
> > such cases?
> >
> > [1] -
> > standard_planner()
> > {
> > ..
> > if ((cursorOptions & CURSOR_OPT_PARALLEL_OK) != 0 && IsUnderPostmaster &&
> > parse->commandType == CMD_SELECT &&
> > !parse->hasModifyingCTE &&
> > max_parallel_workers_per_gather > 0 &&
> > !IsParallelWorker())
> > {
> > /* all the cheap tests pass, so scan the query tree */
> > glob->maxParallelHazard = max_parallel_hazard(parse); parallelModeOK =
> > glob->(glob->maxParallelHazard != PROPARALLEL_UNSAFE);
> > }
> > else
> > {
> > /* skip the query tree scan, just assume it's unsafe */
> > glob->maxParallelHazard = PROPARALLEL_UNSAFE; parallelModeOK = false;
> > }
>
> +1.
>
> In the current parallel_dml option patch. I put this check and some high-level check in a separate function called is_parallel_possible_for_modify.
>
>
> - * PROPARALLEL_UNSAFE, PROPARALLEL_RESTRICTED, PROPARALLEL_SAFE
> + * Check at a high-level if parallel mode is able to be used for the specified
> + * table-modification statement.
> + * It's not possible in the following cases:
> + *
> + * 1) enable_parallel_dml is off
> + * 2) UPDATE or DELETE command
> + * 3) INSERT...ON CONFLICT...DO UPDATE
> + * 4) INSERT without SELECT on a relation
> + * 5) the reloption parallel_dml_enabled is not set for the target table
> + *
> + * (Note: we don't do in-depth parallel-safety checks here, we do only the
> + * cheaper tests that can quickly exclude obvious cases for which
> + * parallelism isn't supported, to avoid having to do further parallel-safety
> + * checks for these)
> */
> +bool
> +is_parallel_possible_for_modify(Query *parse)
>
>
>
>
>
> And I put the function at earlier place like the following:
>
>
> if ((cursorOptions & CURSOR_OPT_PARALLEL_OK) != 0 &&
> IsUnderPostmaster &&
> (parse->commandType == CMD_SELECT ||
> - (enable_parallel_dml &&
> - IsModifySupportedInParallelMode(parse->commandType))) &&
> + is_parallel_possible_for_modify(parse)) &&
> !parse->hasModifyingCTE &&
> max_parallel_workers_per_gather > 0 &&
> !IsParallelWorker())
>
>
> If this looks good, maybe we can merge this change.
>

If I've understood correctly, you're suggesting to merge the
"is_parallel_possible_for_modify()" code from your parallel_dml patch
into the main parallel INSERT patch, right?
If so, I think that's a good idea, as it will help simplify both
patches (and then, if need be, we can still discuss where best to
place certain checks ...).
I'll post an update soon that includes that change (then the
parallel_dml patch will need to be rebased accordingly).

Regards,
Greg Nancarrow
Fujitsu Australia

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michail Nikolaev 2021-02-10 23:27:45 Re: [PATCH] Full support for index LP_DEAD hint bits on standby
Previous Message Ranier Vilela 2021-02-10 23:12:38 Possible dereference null return (src/backend/replication/logical/reorderbuffer.c)