Re: Parallel INSERT (INTO ... SELECT ...)

From: Antonin Houska <ah(at)cybertec(dot)at>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Greg Nancarrow <gregn4422(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel INSERT (INTO ... SELECT ...)
Date: 2021-01-08 06:53:45
Message-ID: 22691.1610088825@antos
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:

> On Wed, Jan 6, 2021 at 2:09 PM Antonin Houska <ah(at)cybertec(dot)at> wrote:
> >
> > Greg Nancarrow <gregn4422(at)gmail(dot)com> wrote:
> >
> > > Posting an updated set of patches to address recent feedback:
> >
> > Following is my review.
> >
> > v11-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch
> > --------------------------------------------------------------
> >
> > @@ -342,6 +343,18 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
> > /* all the cheap tests pass, so scan the query tree */
> > glob->maxParallelHazard = max_parallel_hazard(parse);
> > glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE);
> > +
> > + /*
> > + * Additional parallel-mode safety checks are required in order to
> > + * allow an underlying parallel query to be used for a
> > + * table-modification command that is supported in parallel-mode.
> > + */
> > + if (glob->parallelModeOK &&
> > + IsModifySupportedInParallelMode(parse->commandType))
> > + {
> > + glob->maxParallelHazard = max_parallel_hazard_for_modify(parse, &glob->maxParallelHazard);
> > + glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE);
> > + }
> >
> > Is it really ok to allow PROPARALLEL_RESTRICTED? Per definition, these
> > functions should not be called by parallel worker.
> >
>
> What in the above change indicates that the parallel_restricted will
> be allowed in parallel workers. This just sets paralleModeOK to allow
> parallel plans for Selects if the Insert can be performed safely in a
> leader backend.

Well, this is just the initial setting, while the distinction between "Gather
-> Insert -> ..." and "Insert -> Gather -> ..." is made later. So I withdraw
my objection.

> >
> > @@ -1015,6 +1016,27 @@ IsInParallelMode(void)
> > }
> >
> > /*
> > + * PrepareParallelMode
> > + *
> > + * Prepare for entering parallel mode, based on command-type.
> > + */
> > +void
> > +PrepareParallelMode(CmdType commandType)
> > +{
> > + Assert(!IsInParallelMode() || force_parallel_mode != FORCE_PARALLEL_OFF);
> >
> > Isn't the test of force_parallel_mode just a hack to make regression tests
> > pass? When I removed this part and ran the regression tests with
> > force_parallel_mode=regress, the assertion fired when executing a subquery
> > because the executor was already in parallel mode due to the main query
> > execution.
> >
>
> I think this Assert is bogus. We are allowed to enter in parallel-mode
> if we are already in parallel-mode, see EnterParallelMode.

Right.

> But we shouldn't be allowed allocate xid in parallel-mode. So the
> Assert(!IsInParallelMode()) should be moved inside the check if
> (IsModifySupportedInParallelMode(commandType)) in this function. Can you
> check if it still fails after such a modification?

Yes, this works.

> > As an alternative, have you considered allocation of the XID even in parallel
> > mode? I imagine that the first parallel worker that needs the XID for
> > insertions allocates it and shares it with the other workers as well as with
> > the leader process.
> >
>
> As a matter of this patch
> (v11-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch), we
> never need to allocate xids by workers because Insert is always
> performed by leader backend.

When writing this comment, I was actually thinking of
v11-0003-Enable-parallel-INSERT-and-or-SELECT-for-INSERT-INTO.patch rather
than v11-0001, see below. On the other hand, if we allowed XID allocation in
the parallel mode (as a separate patch), even the 0001 patch would get a bit
simpler.

> Even, if we want to do what you are suggesting it would be tricky because
> currently, we don't have such an infrastructure where we can pass
> information among workers.

How about barriers (storage/ipc/barrier.c)? What I imagine is that all the
workers "meet" at the barrier when they want to insert the first tuple. Then
one of them allocates the XID, makes it available to others (via shared
memory) and all the workers can continue.

> > One problem of the current patch version is that the "INSERT INTO ... SELECT
> > ..." statement consumes XID even if the SELECT eventually does not return any
> > row. However, if the same query is processed w/o parallelism, the XID is only
> > allocated if at least one tuple needs to be inserted.
> >

> Yeah, that is true but I think this can happen w/o parallelism for
> updates and deletes where by the time we try to modify the row, it got
> modified by a concurrent session and the first session will needlessly
> allocate XID.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2021-01-08 07:11:53 Re: Refactoring HMAC in the core code
Previous Message Amit Langote 2021-01-08 06:49:03 Re: Huge memory consumption on partitioned table with FKs