Re: Parallel INSERT (INTO ... SELECT ...)

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Greg Nancarrow <gregn4422(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel INSERT (INTO ... SELECT ...)
Date: 2020-10-12 08:52:43
Message-ID: CAA4eK1JogfXUa=3wMPO+K=UiOLgHgCO7-fj1wCHsSxdaXsfVbw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Oct 12, 2020 at 12:38 PM Greg Nancarrow <gregn4422(at)gmail(dot)com> wrote:
>
> On Mon, Oct 12, 2020 at 5:36 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > >
> > > I have not thought about this yet but I don't understand your
> > > proposal. How will you set it only for the scope of Gather (Merge)?
> > > The execution of the Gather node will be interleaved with the Insert
> > > node, basically, you fetch a tuple from Gather, and then you need to
> > > Insert it. Can you be a bit more specific on what you have in mind for
> > > this?
> > >
> >
> > One more thing I would like to add here is that we can't exit
> > parallel-mode till the workers are running (performing the scan or
> > other operation it is assigned with) and shared memory is not
> > destroyed. Otherwise, the leader can perform un-safe things like
> > assigning new commandsids or probably workers can send some error for
> > which the leader should still be in parallel-mode. So, considering
> > this I think we need quite similar checks (similar to parallel
> > inserts) to make even the Select part parallel for Inserts. If we do
> > that then you won't face many of the problems you mentioned above like
> > executing triggers that contain parallel-unsafe stuff. I feel still it
> > will be beneficial to do this as it will cover cases like Insert with
> > GatherMerge underneath it which would otherwise not possible.
> >
>
> Yes, I see what you mean, exiting parallel-mode can't be done safely
> where I had hoped it could, so looks like, even for making just the
> Select part of Insert parallel, I need to add checks (along the same
> lines as for Parallel Insert) to avoid the parallel Select in certain
> potentially-unsafe cases.
>

Right, after we take care of that, we can think of assigning xid or
things like that before we enter parallel mode. Say we have a function
like PrepareParallelMode (or PrepareEnterParallelMode) or something
like that where we can check whether we need to perform
parallel-safe-write operation (as of now Insert) and then do the
required preparation like assign xid, etc. I think this might not be
idle because it is possible that we don't fetch even a single row (say
due to filter condition) which needs to be inserted and then we will
waste xid but such cases might not occur often enough to worry.

--
With Regards,
Amit Kapila.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2020-10-12 08:58:43 Re: [HACKERS] Custom compression methods
Previous Message Ashutosh Bapat 2020-10-12 08:44:03 Re: Remove some unnecessary if-condition