RE: Parallel Inserts in CREATE TABLE AS

From: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Zhihong Yu <zyu(at)yugabyte(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>
Subject: RE: Parallel Inserts in CREATE TABLE AS
Date: 2021-03-17 09:45:33
Message-ID: OS0PR01MB571648256443B552B8BCBE4F946A9@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


> > Seems like v22 patch was failing in cfbot for one of the unstable test cases.
> > Attaching v23 patch set with modification in 0003 and 0004 patches. No
> > changes to 0001 and 0002 patches. Hopefully cfbot will be happy with v23.
> >
> > Please consider v23 for further review.
> Hi,
>
> I was looking into the latest patch, here are some comments:

Few comments.

1)
Executing the following SQL will cause assertion failure.
-----------sql---------------
create table data(a int);
insert into data select 1 from generate_series(1,1000000,1) t;
explain (verbose) create table tt as select a,2 from data;
--------------------------

The stack message:
-----------stack---------------
TRAP: FailedAssertion("!allow && gather_exists && tuple_cost_opts && !(*tuple_cost_opts & PARALLEL_INSERT_TUP_COST_IGNORED)", File: "execParallel.c", Line: 1872, PID: 1618247)
postgres: houzj postgres [local] EXPLAIN(ExceptionalCondition+0x8b)[0x940f0b]
postgres: houzj postgres [local] EXPLAIN[0x67ba1c]
postgres: houzj postgres [local] EXPLAIN(ExplainOnePlan+0x1c2)[0x605997]
postgres: houzj postgres [local] EXPLAIN[0x605d11]
postgres: houzj postgres [local] EXPLAIN(ExplainOneUtility+0x162)[0x605eb0]
--------------------------
In this case, The Gather node have projection in which case parallel CTAS is not supported,
but we still ignore the cost in planner.
If we plan to detect the projection, we may need to check tlist_same_exprs.

+ if (tlist_same_exprs)
+ {
ignore_parallel_tuple_cost(root);
+ }
generate_useful_gather_paths(root, rel, false);

2)
+ * Parallelize inserts only when the upper Gather node has no projections.
*/
- gstate->dest = dest;
+ if (!gstate->ps.ps_ProjInfo)

IMO, It's better to add some comments about why we do not support projection for now.
Because, not all the projection are parallel unsafe (such as the case in 1) ), it will be desirable to support these later.

3)

+ if (IsParallelInsertionAllowed(PARALLEL_INSERT_CMD_CREATE_TABLE_AS,
+ &parallel_ins_info))
...
/* plan the query */
plan = pg_plan_query(query, pstate->p_sourcetext,
CURSOR_OPT_PARALLEL_OK, params);
...
+ if (IsParallelInsertionAllowed(PARALLEL_INSERT_CMD_CREATE_TABLE_AS,
+ &parallel_ins_info))
Currently, the patch call IsParallelInsertionAllowed() before and after pg_plan_query(),
This might lead to a misunderstanding that parallel_ins_info will get changed during pg_plan_query().
Since parallel_ins_info will not get changed in pg_plan_query, is it possible to add a bool flag(allowed)
in parallel_ins_info to avoid the second call of IsParallelInsertionAllowed ?

Best regards,
houzj

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Paul Guo 2021-03-17 10:42:46 Re: fdatasync performance problem with large number of DB files
Previous Message Zhihong Yu 2021-03-17 09:19:23 Re: A new function to wait for the backend exit after termination