Re: Parallel INSERT (INTO ... SELECT ...)

From: Greg Nancarrow <gregn4422(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel INSERT (INTO ... SELECT ...)
Date: 2020-10-09 10:20:01
Message-ID: CAJcOf-excOA7qDEDeQJygXy7211dAHkiFBFUqTETqcOvsph9dg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Oct 9, 2020 at 8:09 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> 0001-InsertParallelSelect
> 1.
> ParallelContext *pcxt;
>
> + /*
> + * We need to avoid an attempt on INSERT to assign a
> + * FullTransactionId whilst in parallel mode (which is in
> + * effect due to the underlying parallel query) - so the
> + * FullTransactionId is assigned here. Parallel mode must
> + * be temporarily escaped in order for this to be possible.
> + * The FullTransactionId will be included in the transaction
> + * state that is serialized in the parallel DSM.
> + */
> + if (estate->es_plannedstmt->commandType == CMD_INSERT)
> + {
> + Assert(IsInParallelMode());
> + ExitParallelMode();
> + GetCurrentFullTransactionId();
> + EnterParallelMode();
> + }
> +
>
> This looks like a hack to me. I think you are doing this to avoid the
> parallel mode checks in GetNewTransactionId(), right?

Yes, agreed, is a hack to avoid that (mind you, it's not exactly great
that ExecutePlan() sets parallel-mode for the entire plan execution).
Also, did not expect that to necessarily remain in a final patch.

>If so, I have
> already mentioned above [1] that we can change it so that we disallow
> assigning xids for parallel workers only. The same is true for the
> check in ExecGatherMerge. Do you see any problem with that suggestion?
>

No, should be OK I guess, but will update and test to be sure.

> 2.
> @@ -337,7 +337,7 @@ standard_planner(Query *parse, const char
> *query_string, int cursorOptions,
> */
> if ((cursorOptions & CURSOR_OPT_PARALLEL_OK) != 0 &&
> IsUnderPostmaster &&
> - parse->commandType == CMD_SELECT &&
> + (parse->commandType == CMD_SELECT || parse->commandType == CMD_INSERT) &&
> !parse->hasModifyingCTE &&
> max_parallel_workers_per_gather > 0 &&
> !IsParallelWorker())
>
> I think the comments above this need to be updated especially the part
> where we says:"Note that we do allow CREATE TABLE AS, SELECT INTO, and
> CREATE MATERIALIZED VIEW to use parallel plans, but as of now, only
> the leader backend writes into a completely new table.". Don't we need
> to include Insert also?

Yes, Insert needs to be mentioned somewhere there.

>
> 3.
> @@ -371,6 +371,7 @@ standard_planner(Query *parse, const char
> *query_string, int cursorOptions,
> * parallel-unsafe, or else the query planner itself has a bug.
> */
> glob->parallelModeNeeded = glob->parallelModeOK &&
> + (parse->commandType == CMD_SELECT) &&
> (force_parallel_mode != FORCE_PARALLEL_OFF);
>
> Why do you need this change? The comments above this code should be
> updated to reflect this change. I think for the same reason the below
> code seems to be modified but I don't understand the reason for the
> below change as well, also it is better to update the comments for
> this as well.
>

OK, I will update the comments for this.
Basically, up to now, the "force_parallel_mode" has only ever operated
on a SELECT.
But since we are now allowing CMD_INSERT to be assessed for parallel
mode too, we need to prevent the force_parallel_mode logic from
sticking a Gather node over the top of arbitrary INSERTs and causing
them to be run in parallel. Not all INSERTs are suitable for parallel
operation, and also there are further considerations for
parallel-safety for INSERTs compared to SELECT. INSERTs can also
trigger UPDATEs.
If we need to support force_parallel_mode for INSERT, more work will
need to be done.

> @@ -425,7 +426,7 @@ standard_planner(Query *parse, const char
> *query_string, int cursorOptions,
> * Optionally add a Gather node for testing purposes, provided this is
> * actually a safe thing to do.
> */
> - if (force_parallel_mode != FORCE_PARALLEL_OFF && top_plan->parallel_safe)
> + if (force_parallel_mode != FORCE_PARALLEL_OFF && parse->commandType
> == CMD_SELECT && top_plan->parallel_safe)
> {
> Gather *gather = makeNode(Gather);
>
> [1] - https://www.postgresql.org/message-id/CAA4eK1%2BE-pM0U6qw7EOF0yO0giTxdErxoJV9xTqN%2BLo9zdotFQ%40mail.gmail.com
>

Regards,
Greg Nancarrow
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2020-10-09 10:20:07 Re: Parallel copy
Previous Message Pavel Stehule 2020-10-09 10:17:32 Re: SEARCH and CYCLE clauses