Re: Parallel INSERT (INTO ... SELECT ...)

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Antonin Houska <ah(at)cybertec(dot)at>
Cc: Greg Nancarrow <gregn4422(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel INSERT (INTO ... SELECT ...)
Date: 2021-01-07 10:36:53
Message-ID: CAA4eK1KZMEGEu5Q9iWH=hUY0p4VjOAdOjWfuOKhoNZVXK3LmtA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 6, 2021 at 2:09 PM Antonin Houska <ah(at)cybertec(dot)at> wrote:
>
> Greg Nancarrow <gregn4422(at)gmail(dot)com> wrote:
>
> > Posting an updated set of patches to address recent feedback:
>
> Following is my review.
>
> v11-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch
> --------------------------------------------------------------
>
> @@ -342,6 +343,18 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
> /* all the cheap tests pass, so scan the query tree */
> glob->maxParallelHazard = max_parallel_hazard(parse);
> glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE);
> +
> + /*
> + * Additional parallel-mode safety checks are required in order to
> + * allow an underlying parallel query to be used for a
> + * table-modification command that is supported in parallel-mode.
> + */
> + if (glob->parallelModeOK &&
> + IsModifySupportedInParallelMode(parse->commandType))
> + {
> + glob->maxParallelHazard = max_parallel_hazard_for_modify(parse, &glob->maxParallelHazard);
> + glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE);
> + }
>
> Is it really ok to allow PROPARALLEL_RESTRICTED? Per definition, these
> functions should not be called by parallel worker.
>

What in the above change indicates that the parallel_restricted will
be allowed in parallel workers. This just sets paralleModeOK to allow
parallel plans for Selects if the Insert can be performed safely in a
leader backend.

>
> @@ -1015,6 +1016,27 @@ IsInParallelMode(void)
> }
>
> /*
> + * PrepareParallelMode
> + *
> + * Prepare for entering parallel mode, based on command-type.
> + */
> +void
> +PrepareParallelMode(CmdType commandType)
> +{
> + Assert(!IsInParallelMode() || force_parallel_mode != FORCE_PARALLEL_OFF);
>
> Isn't the test of force_parallel_mode just a hack to make regression tests
> pass? When I removed this part and ran the regression tests with
> force_parallel_mode=regress, the assertion fired when executing a subquery
> because the executor was already in parallel mode due to the main query
> execution.
>

I think this Assert is bogus. We are allowed to enter in parallel-mode
if we are already in parallel-mode, see EnterParallelMode. But we
shouldn't be allowed allocate xid in parallel-mode. So the
Assert(!IsInParallelMode()) should be moved inside the check if
(IsModifySupportedInParallelMode(commandType)) in this function. Can
you check if it still fails after such a modification?

> As an alternative, have you considered allocation of the XID even in parallel
> mode? I imagine that the first parallel worker that needs the XID for
> insertions allocates it and shares it with the other workers as well as with
> the leader process.
>

As a matter of this patch
(v11-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch), we
never need to allocate xids by workers because Insert is always
performed by leader backend. Even, if we want to do what you are
suggesting it would be tricky because currently, we don't have such an
infrastructure where we can pass information among workers.

> One problem of the current patch version is that the "INSERT INTO ... SELECT
> ..." statement consumes XID even if the SELECT eventually does not return any
> row. However, if the same query is processed w/o parallelism, the XID is only
> allocated if at least one tuple needs to be inserted.
>

Yeah, that is true but I think this can happen w/o parallelism for
updates and deletes where by the time we try to modify the row, it got
modified by a concurrent session and the first session will needlessly
allocate XID.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2021-01-07 10:53:17 Re: Prevent printing "next step instructions" in initdb and pg_upgrade
Previous Message Bharath Rupireddy 2021-01-07 09:53:02 Re: EXPLAIN/EXPLAIN ANALYZE REFRESH MATERIALIZED VIEW