RE: Parallel Inserts in CREATE TABLE AS

From: "Hou, Zhijie" <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Zhihong Yu <zyu(at)yugabyte(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Subject: RE: Parallel Inserts in CREATE TABLE AS
Date: 2020-12-15 11:29:46
Message-ID: 4ab802f593b04185a43d2b85f2fd5966@G08CNEXMBPEKD05.g08.fujitsu.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> Thanks for the append use case.
>
> Here's my analysis on pushing parallel inserts down even in case the top
> node is Append.
>
> For union cases which need to remove duplicate tuples, we can't push the
> inserts or CTAS dest receiver down. If I'm not wrong, Append node is not
> doing duplicate removal(??), I saw that it's the HashAggregate node (which
> is the top node that removes the duplicate tuples). And also for
> except/except all/intersect/intersect all cases we receive HashSetOp nodes
> on top of Append. So for both cases, our check for Gather or Append at the
> top node is enough to detect this to not allow parallel inserts.
>
> For union all:
> case 1: We can push the CTAS dest receiver to each Gather node Append
> ->Gather
> ->Parallel Seq Scan
> ->Gather
> ->Parallel Seq Scan
> ->Gather
> ->Parallel Seq Scan
>
> case 2: We can still push the CTAS dest receiver to each Gather node.
> Non-Gather nodes will do inserts as they do now i.e. by sending tuples to
> Append and from there to CTAS dest receiver.
> Append
> ->Gather
> ->Parallel Seq Scan
> ->Seq Scan / Join / any other non-Gather node
> ->Gather
> ->Parallel Seq Scan
> ->Seq Scan / Join / any other non-Gather node
>
> case 3: We can push the CTAS dest receiver to Gather Gather
> ->Parallel Append
> ->Parallel Seq Scan
> ->Parallel Seq Scan
>
> case 4: We can push the CTAS dest receiver to Gather Gather
> ->Parallel Append
> ->Parallel Seq Scan
> ->Parallel Seq Scan
> ->Seq Scan / Join / any other non-Gather node
>
> Please let me know if I'm missing any other possible use case.
>
> Thoughts?

Yes, The analysis looks right to me.

> As suggested by Amit earlier, I kept the 0001 patch(so far) such that it
> doesn't have the code to influence the planner to consider parallel tuple
> cost as 0. It works on the plan whatever gets generated and decides to allow
> parallel inserts or not. And in the 0002 patch, I added the code for
> influencing the planner to consider parallel tuple cost as 0. Maybe we can
> have a 0003 patch for tests alone.
>
> Once we are okay with the above analysis and use cases, we can incorporate
> the Append changes to respective patches.
>
> Hope that's okay.

A little explanation about how to push down the ctas info in append.

1. about how to ignore tuple cost in this case.
IMO, it create gather path under append like the following:
query_planner
-make_one_rel
--set_base_rel_sizes
---set_rel_size
----set_append_rel_size (*)
-----set_rel_size
------set_subquery_pathlist
-------subquery_planner
--------grouping_planner
---------apply_scanjoin_target_to_paths
----------generate_useful_gather_paths

set_append_rel_size seems the right place where we can check and set a flag to ignore tuple cost later.
We can set the flag for two cases when there is no parent path will be created(such as : limit,sort,distinct...):
i) query_level is 1
ii) query_level > 1 and we have set the flag in the parent_root.

The case ii) is to check append under append:
Append
->Append
->Gather
->Other plan

2.about how to push ctas info down.

We traversing the whole plans tree, and we only care Append and Gather type.
Gather: It set the ctas dest info and returned true at once if the gathernode does not have projection.
Append: It will recursively traversing the subplan of Appendnode and will reture true if one of the subplan can be parallel.

+PushDownCTASParallelInsertState(DestReceiver *dest, PlanState *ps)
+{
+ bool parallel = false;
+
+ if(ps == NULL)
+ return parallel;
+
+ if(IsA(ps, AppendState))
+ {
+ AppendState *aps = (AppendState *) ps;
+ for(int i = 0; i < aps->as_nplans; i++)
+ {
+ parallel |= PushDownCTASParallelInsertState(dest, aps->appendplans[i]);
+ }
+ }
+ else if(IsA(ps, GatherState) && !ps->ps_ProjInfo)
+ {
+ GatherState *gstate = (GatherState *) ps;
+ parallel = true;
+
+ ((DR_intorel *) dest)->is_parallel = true;
+ gstate->dest = dest;
+ ps->plan->plan_rows = 0;
+ }
+
+ return parallel;
+}

Best regards,
houzj

Attachment Content-Type Size
0001-support-pctas-in-append-parallel-inserts.patch application/octet-stream 2.9 KB
0002-support-pctas-in-append-tuple-cost-adjustment.patch application/octet-stream 3.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hou, Zhijie 2020-12-15 12:18:06 RE: Parallel Inserts in CREATE TABLE AS
Previous Message Julien Rouhaud 2020-12-15 11:21:55 Re: REINDEX backend filtering