Re: Parallel Inserts in CREATE TABLE AS

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>
Subject: Re: Parallel Inserts in CREATE TABLE AS
Date: 2020-10-14 12:46:35
Message-ID: CAA4eK1KWSVYn8hxnTKZfcKiHxhqmj842V1tK9f88mYMCGgGxtQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Oct 14, 2020 at 2:46 PM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>
> On Tue, Oct 6, 2020 at 10:58 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> >
> > While skimming through the patch, a small thing I noticed:
> > + /*
> > + * SELECT part of the CTAS is parallelizable, so we can make
> > + * each parallel worker insert the tuples that are resulted
> > + * in it's execution into the target table.
> > + */
> > + if (!is_matview &&
> > + IsA(plan->planTree, Gather))
> > + ((DR_intorel *) dest)->is_parallel = true;
> > +
> >
> > I am not sure at this stage if this is the best way to make CTAS as
> > parallel but if so, then probably you can expand the comments a bit to
> > say why you consider only Gather node (and that too when it is the
> > top-most node) and why not another parallel node like GatherMerge?
> >
>
> If somebody expects to preserve the order of the tuples that are
> coming from GatherMerge node of the select part in CTAS or SELECT INTO
> while inserting, now if parallelism is allowed, that may not be the
> case i.e. the order of insertion of tuples may vary. I'm not quite
> sure, if someone wants to use order by in the select parts of CTAS or
> SELECT INTO in a real world use case. Thoughts?
>

I think there is no reason why one can't use ORDER BY in the
statements we are talking about here. But, I think we can't enable
parallelism for GatherMerge is because for that node we always need to
fetch the data in the leader backend to perform the final merge phase.
So, I was expecting a small comment saying something on those lines.

>
> >
> > Need to analyze whether to allow parallelism if CTAS has prepared statements or with no data.
> >
>
> For prepared statements, the parallelism will not be picked and so is
> parallel insertion.
>

Hmm, I am not sure what makes you say this statement. The parallelism
is enabled for prepared statements since commit 57a6a72b6b.

>
> I'm listing the things that are still pending.
>
> 1. How to represent the parallel insert for CTAS in explain plans? The
> explain CTAS shows the plan for only the SELECT part. How about having
> some textual info along with the Gather node? I'm not quite sure on
> this point, any suggestions are welcome.
>

I am also not sure about this point because we don't display anything
for the DDL part in explain. Can you propose by showing some example
of what you have in mind?

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message vignesh C 2020-10-14 12:54:39 Re: Parallel copy
Previous Message Ajin Cherian 2020-10-14 12:44:05 Re: [HACKERS] logical decoding of two-phase transactions