RE: Parallel Inserts in CREATE TABLE AS

From: "Hou, Zhijie" <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Zhihong Yu <zyu(at)yugabyte(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Subject: RE: Parallel Inserts in CREATE TABLE AS
Date: 2020-12-15 12:18:06
Message-ID: bc400813bc8a4a0194d21098c1b8276f@G08CNEXMBPEKD05.g08.fujitsu.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> From: Hou, Zhijie [mailto:houzj(dot)fnst(at)cn(dot)fujitsu(dot)com]
> Sent: Tuesday, December 15, 2020 7:30 PM
> To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
> Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>; Luc Vlaming <luc(at)swarm64(dot)com>;
> PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>; Zhihong Yu
> <zyu(at)yugabyte(dot)com>; Dilip Kumar <dilipbalaut(at)gmail(dot)com>
> Subject: RE: Parallel Inserts in CREATE TABLE AS
>
> > Thanks for the append use case.
> >
> > Here's my analysis on pushing parallel inserts down even in case the
> > top node is Append.
> >
> > For union cases which need to remove duplicate tuples, we can't push
> > the inserts or CTAS dest receiver down. If I'm not wrong, Append node
> > is not doing duplicate removal(??), I saw that it's the HashAggregate
> > node (which is the top node that removes the duplicate tuples). And
> > also for except/except all/intersect/intersect all cases we receive
> > HashSetOp nodes on top of Append. So for both cases, our check for
> > Gather or Append at the top node is enough to detect this to not allow
> parallel inserts.
> >
> > For union all:
> > case 1: We can push the CTAS dest receiver to each Gather node Append
> > ->Gather
> > ->Parallel Seq Scan
> > ->Gather
> > ->Parallel Seq Scan
> > ->Gather
> > ->Parallel Seq Scan
> >
> > case 2: We can still push the CTAS dest receiver to each Gather node.
> > Non-Gather nodes will do inserts as they do now i.e. by sending tuples
> > to Append and from there to CTAS dest receiver.
> > Append
> > ->Gather
> > ->Parallel Seq Scan
> > ->Seq Scan / Join / any other non-Gather node
> > ->Gather
> > ->Parallel Seq Scan
> > ->Seq Scan / Join / any other non-Gather node
> >
> > case 3: We can push the CTAS dest receiver to Gather Gather
> > ->Parallel Append
> > ->Parallel Seq Scan
> > ->Parallel Seq Scan
> >
> > case 4: We can push the CTAS dest receiver to Gather Gather
> > ->Parallel Append
> > ->Parallel Seq Scan
> > ->Parallel Seq Scan
> > ->Seq Scan / Join / any other non-Gather node
> >
> > Please let me know if I'm missing any other possible use case.
> >
> > Thoughts?
>
>
> Yes, The analysis looks right to me.
>
>
> > As suggested by Amit earlier, I kept the 0001 patch(so far) such that
> > it doesn't have the code to influence the planner to consider parallel
> > tuple cost as 0. It works on the plan whatever gets generated and
> > decides to allow parallel inserts or not. And in the 0002 patch, I
> > added the code for influencing the planner to consider parallel tuple
> > cost as 0. Maybe we can have a 0003 patch for tests alone.
> >
> > Once we are okay with the above analysis and use cases, we can
> > incorporate the Append changes to respective patches.
> >
> > Hope that's okay.
>
> A little explanation about how to push down the ctas info in append.
>
> 1. about how to ignore tuple cost in this case.
> IMO, it create gather path under append like the following:
> query_planner
> -make_one_rel
> --set_base_rel_sizes
> ---set_rel_size
> ----set_append_rel_size (*)
> -----set_rel_size
> ------set_subquery_pathlist
> -------subquery_planner
> --------grouping_planner
> ---------apply_scanjoin_target_to_paths
> ----------generate_useful_gather_paths
>
> set_append_rel_size seems the right place where we can check and set a flag
> to ignore tuple cost later.
> We can set the flag for two cases when there is no parent path will be
> created(such as : limit,sort,distinct...):
> i) query_level is 1
> ii) query_level > 1 and we have set the flag in the parent_root.
>
> The case ii) is to check append under append:
> Append
> ->Append
> ->Gather
> ->Other plan
>
> 2.about how to push ctas info down.
>
> We traversing the whole plans tree, and we only care Append and Gather type.
> Gather: It set the ctas dest info and returned true at once if the gathernode
> does not have projection.
> Append: It will recursively traversing the subplan of Appendnode and will
> reture true if one of the subplan can be parallel.
>
> +PushDownCTASParallelInsertState(DestReceiver *dest, PlanState *ps) {
> + bool parallel = false;
> +
> + if(ps == NULL)
> + return parallel;
> +
> + if(IsA(ps, AppendState))
> + {
> + AppendState *aps = (AppendState *) ps;
> + for(int i = 0; i < aps->as_nplans; i++)
> + {
> + parallel |=
> PushDownCTASParallelInsertState(dest, aps->appendplans[i]);
> + }
> + }
> + else if(IsA(ps, GatherState) && !ps->ps_ProjInfo)
> + {
> + GatherState *gstate = (GatherState *) ps;
> + parallel = true;
> +
> + ((DR_intorel *) dest)->is_parallel = true;
> + gstate->dest = dest;
> + ps->plan->plan_rows = 0;
> + }
> +
> + return parallel;
> +}

So sorry for my miss, my last patch has some mistakes.
Attatch the new one.

Best regards,
houzj

Attachment Content-Type Size
0001-support-pctas-in-append-parallel-inserts.patch application/octet-stream 2.9 KB
0002-support-pctas-in-append-tuple-cost-adjustment.patch application/octet-stream 3.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2020-12-15 12:23:22 Re: Parallel Inserts in CREATE TABLE AS
Previous Message Hou, Zhijie 2020-12-15 11:29:46 RE: Parallel Inserts in CREATE TABLE AS