Re: Parallel Inserts in CREATE TABLE AS

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: "Hou, Zhijie" <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Zhihong Yu <zyu(at)yugabyte(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Subject: Re: Parallel Inserts in CREATE TABLE AS
Date: 2020-12-15 08:36:21
Message-ID: CALj2ACWLA775yaSCo06FM+qh6is3W3Ht5SvqUHBzu=x0-jPa0g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Dec 14, 2020 at 6:08 PM Hou, Zhijie <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com> wrote:
> Currently with the patch, we can allow parallel CTAS when topnode is Gather.
> When top-node is Append and Gather is the sub-node of Append, I think we can still enable
> Parallel CTAS by pushing Parallel CTAS down to the sub-node Gather, such as:
>
> Append
> ------>Gather
> --------->Create table
> ------------->Seqscan
> ------>Gather
> --------->create table
> ------------->Seqscan
>
> And the use case seems common to me, such as:
> select * from A where xxx union all select * from B where xxx;

Thanks for the append use case.

Here's my analysis on pushing parallel inserts down even in case the
top node is Append.

For union cases which need to remove duplicate tuples, we can't push
the inserts or CTAS dest receiver down. If I'm not wrong, Append node
is not doing duplicate removal(??), I saw that it's the HashAggregate
node (which is the top node that removes the duplicate tuples). And
also for except/except all/intersect/intersect all cases we receive
HashSetOp nodes on top of Append. So for both cases, our check for
Gather or Append at the top node is enough to detect this to not allow
parallel inserts.

For union all:
case 1: We can push the CTAS dest receiver to each Gather node
Append
->Gather
->Parallel Seq Scan
->Gather
->Parallel Seq Scan
->Gather
->Parallel Seq Scan

case 2: We can still push the CTAS dest receiver to each Gather node.
Non-Gather nodes will do inserts as they do now i.e. by sending tuples
to Append and from there to CTAS dest receiver.
Append
->Gather
->Parallel Seq Scan
->Seq Scan / Join / any other non-Gather node
->Gather
->Parallel Seq Scan
->Seq Scan / Join / any other non-Gather node

case 3: We can push the CTAS dest receiver to Gather
Gather
->Parallel Append
->Parallel Seq Scan
->Parallel Seq Scan

case 4: We can push the CTAS dest receiver to Gather
Gather
->Parallel Append
->Parallel Seq Scan
->Parallel Seq Scan
->Seq Scan / Join / any other non-Gather node

Please let me know if I'm missing any other possible use case.

Thoughts?

> I attach a WIP patch which just show the possibility of this feature.
> The patch is based on the latest v11-patch.
>
> What do you think?

As suggested by Amit earlier, I kept the 0001 patch(so far) such that
it doesn't have the code to influence the planner to consider parallel
tuple cost as 0. It works on the plan whatever gets generated and
decides to allow parallel inserts or not. And in the 0002 patch, I
added the code for influencing the planner to consider parallel tuple
cost as 0. Maybe we can have a 0003 patch for tests alone.

Once we are okay with the above analysis and use cases, we can
incorporate the Append changes to respective patches.

Hope that's okay.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Nancarrow 2020-12-15 09:04:55 Re: libpq debug log
Previous Message Noah Misch 2020-12-15 08:20:30 Re: HASH_BLOBS hazards (was Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions)