Re: Parallel Inserts in CREATE TABLE AS

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Zhihong Yu <zyu(at)yugabyte(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, "Hou, Zhijie" <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Inserts in CREATE TABLE AS
Date: 2020-12-30 11:58:20
Message-ID: CALDaNm0v0Z6sNL2t=EMwyZt=UutVVGNZPNEx82cMMw4-Steyqg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Dec 30, 2020 at 9:25 AM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>
> On Wed, Dec 30, 2020 at 5:22 AM Zhihong Yu <zyu(at)yugabyte(dot)com> wrote:
> > w.r.t. v17-0004-Enable-CTAS-Parallel-Inserts-For-Append.patch
> >
> > + * Push the dest receiver to Gather node when it is either at the top of the
> > + * plan or under top Append node unless it does not have any projections to do.
> >
> > I think the 'unless' should be 'if'. As can be seen from the body of the method:
> >
> > + if (!ps->ps_ProjInfo)
> > + {
> > + GatherState *gstate = (GatherState *) ps;
> > +
> > + parallel = true;
>
> Thanks. Modified it in the 0004 patch. Attaching v18 patch set. Note
> that no change in 0001 to 0003 patches from v17.
>
> Please consider v18 patch set for further review.
>

Few comments:
- /*
- * To allow parallel inserts, we need to ensure that they are safe to be
- * performed in workers. We have the infrastructure to allow parallel
- * inserts in general except for the cases where inserts generate a new
- * CommandId (eg. inserts into a table having a foreign key column).
- */
- if (IsParallelWorker())
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
- errmsg("cannot insert tuples in a
parallel worker")));

Is it possible to add a check if it is a CTAS insert here as we do not
support insert in parallel workers from others as of now.

+ Oid objectid; /* workers to
open relation/table. */
+ /* Number of tuples inserted by all the workers. */
+ pg_atomic_uint64 processed;

We can just mention relation instead of relation/table.

+select explain_pictas(
+'create table parallel_write as select length(stringu1) from tenk1;');
+ explain_pictas
+----------------------------------------------------------
+ Gather (actual rows=N loops=N)
+ Workers Planned: 4
+ Workers Launched: N
+ -> Create parallel_write
+ -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(5 rows)
+
+select count(*) from parallel_write;

Can we include selection of cmin, xmin for one of the test to verify
that it uses the same transaction id in the parallel workers
something like:
select distinct(cmin,xmin) from parallel_write;

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2020-12-30 11:59:02 Re: Dump public schema ownership & seclabels
Previous Message Amit Kapila 2020-12-30 11:58:16 Re: [Patch] Optimize dropping of relation buffers using dlist