Re: Parallel Inserts in CREATE TABLE AS

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: "Hou, Zhijie" <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Zhihong Yu <zyu(at)yugabyte(dot)com>
Subject: Re: Parallel Inserts in CREATE TABLE AS
Date: 2020-12-14 14:54:56
Message-ID: CAFiTN-ve-aX40Q4znjz8TtczwNJ920tpV2Pj4zMdJ-JRK3BSfQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Dec 14, 2020 at 4:06 PM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>
> On Thu, Dec 10, 2020 at 7:20 PM Bharath Rupireddy
> <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> > On Thu, Dec 10, 2020 at 5:19 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > > > > > + allow = ps && IsA(ps, GatherState) && !ps->ps_ProjInfo &&
> > > > > > + plannedstmt->parallelModeNeeded &&
> > > > > > + plannedstmt->planTree &&
> > > > > > + IsA(plannedstmt->planTree, Gather) &&
> > > > > > + plannedstmt->planTree->lefttree &&
> > > > > > + plannedstmt->planTree->lefttree->parallel_aware &&
> > > > > > + plannedstmt->planTree->lefttree->parallel_safe;
> > > > > >
> > > > > > I noticed it check both IsA(ps, GatherState) and IsA(plannedstmt->planTree, Gather).
> > > > > > Does it mean it is possible that IsA(ps, GatherState) is true but IsA(plannedstmt->planTree, Gather) is false ?
> > > > > >
> > > > > > I did some test but did not find a case like that.
> > > > > >
> > > > > This seems like an extra check. Apart from that if we combine 0001
> > > > > and 0002 there should be an additional protection so that it should
> > > > > not happen that in cost_gather we have ignored the parallel tuple cost
> > > > > and now we are rejecting the parallel insert. Probably we should add
> > > > > an assert.
> > > >
> > > > Yeah it's an extra check. I don't think we need that extra check IsA(plannedstmt->planTree, Gather). GatherState check is enough. I verified it as follows: the gatherstate will be allocated and initialized with the plan tree in ExecInitGather which are the ones we are checking here. So, there is no chance that the plan state is GatherState and the plan tree will not be Gather. I will remove IsA(plannedstmt->planTree, Gather) check in the next version of the patch set.
> > > >
> > > > Breakpoint 4, ExecInitGather (node=0x5647f98ae994 <ExecCheckRTEPerms+131>, estate=0x1ca8, eflags=730035099) at nodeGather.c:61
> > > > (gdb) p gatherstate
> > > > $10 = (GatherState *) 0x5647fac83850
> > > > (gdb) p gatherstate->ps.plan
> > > > $11 = (Plan *) 0x5647fac918a0
> > > >
> > > > Breakpoint 1, IsParallelInsertInCTASAllowed (into=0x5647fac97580, queryDesc=0x5647fac835e0) at createas.c:663
> > > > 663 {
> > > > (gdb) p ps
> > > > $13 = (PlanState *) 0x5647fac83850
> > > > (gdb) p ps->plan
> > > > $14 = (Plan *) 0x5647fac918a0
> > > >
> > > Hope you did not miss the second part of my comment
> > > "
> > > > Apart from that if we combine 0001
> > > > and 0002 there should be additional protection so that it should
> > > > not happen that in cost_gather we have ignored the parallel tuple cost
> > > > and now we are rejecting the parallel insert. Probably we should add
> > > > an assert.
> > > "
> >
> > IIUC, we need to set a flag in cost_gather(in 0002 patch) whenever we
> > ignore the parallel tuple cost and while checking to allow or disallow
> > parallel inserts in IsParallelInsertInCTASAllowed(), we need to add an
> > assert something like Assert(cost_ignored_in_cost_gather && allow)
> > before return allow;
> >
> > This assertion fails 1) either if we have not ignored the cost but
> > allowing parallel inserts 2) or we ignored the cost but not allowing
> > parallel inserts.
> >
> > 1) seems to be fine, we can go ahead and perform parallel inserts. 2)
> > is the concern that the planner would have wrongly chosen the parallel
> > plan, but in this case also isn't it better to go ahead with the
> > parallel plan instead of failing the query?
> >
> > + /*
> > + * We allow parallel inserts by the workers only if the Gather node has
> > + * no projections to perform and if the upper node is Gather. In case,
> > + * the Gather node has projections, which is possible if there are any
> > + * subplans in the query, the workers can not do those projections. And
> > + * when the upper node is GatherMerge, then the leader has to perform
> > + * the final phase i.e. merge the results by workers.
> > + */
> > + allow = ps && IsA(ps, GatherState) && !ps->ps_ProjInfo &&
> > + plannedstmt->parallelModeNeeded &&
> > + plannedstmt->planTree &&
> > + plannedstmt->planTree->lefttree &&
> > + plannedstmt->planTree->lefttree->parallel_aware &&
> > + plannedstmt->planTree->lefttree->parallel_safe;
> > +
> > + return allow;
> > + }
>
> I added the assertion into the 0002 patch so that it fails when the
> planner ignores parallel tuple cost and may choose parallel plan but
> later we don't allow parallel inserts. make check and make check-world
> passeses without any assertion failures.
>
> Attaching v11 patch set. Please review it further.

I can see a lot of unrelated changes in 0002, or you have done a lot
of code refactoring especially in createas.c file. If it is intended
refactoring then please move the refactoring to a separate patch so
that the patch is readable.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Lukas Meisegeier 2020-12-14 15:01:09 Re: Feature Proposal: Add ssltermination parameter for SNI-based LoadBalancing
Previous Message Fujii Masao 2020-12-14 14:33:53 Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit