RE: Parallel Inserts in CREATE TABLE AS

From: "Hou, Zhijie" <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Zhihong Yu <zyu(at)yugabyte(dot)com>
Subject: RE: Parallel Inserts in CREATE TABLE AS
Date: 2020-12-07 06:02:00
Message-ID: f4af0f3439b24ad48aceac3520c9160a@G08CNEXMBPEKD05.g08.fujitsu.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi

+ /*
+ * Flag to let the planner know that the SELECT query is for CTAS. This is
+ * used to calculate the tuple transfer cost from workers to gather node(in
+ * case parallelism kicks in for the SELECT part of the CTAS), to zero as
+ * each worker will insert its share of tuples in parallel.
+ */
+ if (IsParallelInsertInCTASAllowed(into, NULL))
+ query->isForCTAS = true;

+ /*
+ * We do not compute the parallel_tuple_cost for CTAS because the number of
+ * tuples that are transferred from workers to the gather node is zero as
+ * each worker, in parallel, inserts the tuples that are resulted from its
+ * chunk of plan execution. This change may make the parallel plan cheap
+ * among all other plans, and influence the planner to consider this
+ * parallel plan.
+ */
+ if (!(root->parse->isForCTAS &&
+ root->query_level == 1))
+ run_cost += parallel_tuple_cost * path->path.rows;

I noticed that the parallel_tuple_cost will still be ignored,
When Gather is not the top node.

Example:
Create table test(i int);
insert into test values(generate_series(1,10000000,1));
explain create table ntest3 as select * from test where i < 200 limit 10000;

QUERY PLAN
-------------------------------------------------------------------------------
Limit (cost=1000.00..97331.33 rows=1000 width=4)
-> Gather (cost=1000.00..97331.33 rows=1000 width=4)
Workers Planned: 2
-> Parallel Seq Scan on test (cost=0.00..96331.33 rows=417 width=4)
Filter: (i < 200)

The isForCTAS will be true because [create table as], the
query_level is always 1 because there is no subquery.
So even if gather is not the top node, parallel cost will still be ignored.

Is that works as expected ?

Best regards,
houzj

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message tsunakawa.takay@fujitsu.com 2020-12-07 06:05:58 RE: [bug fix] ALTER TABLE SET LOGGED/UNLOGGED on a partitioned table does nothing silently
Previous Message Amit Khandekar 2020-12-07 04:56:39 Re: Improving spin-lock implementation on ARM.