Re: Parallel INSERT (INTO ... SELECT ...)

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Greg Nancarrow <gregn4422(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel INSERT (INTO ... SELECT ...)
Date: 2020-10-05 11:15:04
Message-ID: CAFiTN-unMPMZDULHvwCcasN-+pZRrZsGr27CTZcYG_xoirxpmA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Oct 5, 2020 at 4:26 PM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>
> On Wed, Sep 30, 2020 at 7:38 AM Greg Nancarrow <gregn4422(at)gmail(dot)com> wrote:
> >
> > > >
> > > > I think you still need to work on the costing part, basically if we
> > > > are parallelizing whole insert then plan is like below
> > > >
> > > > -> Gather
> > > > -> Parallel Insert
> > > > -> Parallel Seq Scan
> > > >
> > > > That means the tuple we are selecting via scan are not sent back to
> > > > the gather node, so in cost_gather we need to see if it is for the
> > > > INSERT then there is no row transferred through the parallel queue
> > > > that mean we need not to pay any parallel tuple cost.
> > >
> > > I just looked into the parallel CTAS[1] patch for the same thing, and
> > > I can see in that patch it is being handled.
> > >
> > > [1] https://www.postgresql.org/message-id/CALj2ACWFq6Z4_jd9RPByURB8-Y8wccQWzLf%2B0-Jg%2BKYT7ZO-Ug%40mail.gmail.com
> > >
> >
> > Hi Dilip,
> >
> > You're right, the costing for Parallel Insert is not done and
> > finished, I'm still working on the costing, and haven't posted an
> > updated patch for it yet.
> > As far as cost_gather() method is concerned, for Parallel INSERT, it
> > can probably use the same costing approach as the CTAS patch except in
> > the case of a specified RETURNING clause.
> >
>
> I have one question which is common to both this patch and parallel
> inserts in CTAS[1], do we need to skip creating tuple
> queues(ExecParallelSetupTupleQueues) as we don't have any tuples
> that's being shared from workers to leader? Put it another way, do we
> use the tuple queue for sharing any info other than tuples from
> workers to leader?

Ideally, we don't need the tuple queue unless we want to transfer the
tuple to the gather node.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Mario Emmenlauer 2020-10-05 11:22:18 dup(0) fails on Ubuntu 20.04 and macOS 10.15 with 13.0
Previous Message Bharath Rupireddy 2020-10-05 10:56:22 Re: Parallel INSERT (INTO ... SELECT ...)