RE: Parallel Inserts in CREATE TABLE AS

From: "Hou, Zhijie" <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Zhihong Yu <zyu(at)yugabyte(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>
Subject: RE: Parallel Inserts in CREATE TABLE AS
Date: 2021-01-11 01:07:03
Message-ID: b7be257dbc784001af779e8d75bea5c2@G08CNEXMBPEKD05.g08.fujitsu.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> Attaching v21 patch set, which has following changes:
> 1) 0001 - changed fpes->ins_cmd_type ==
> PARALLEL_INSERT_CMD_CREATE_TABLE_AS to fpes->ins_cmd_type !=
> PARALLEL_INSERT_CMD_UNDEF
> 2) 0002 - reworded the commit message.
> 3) 0003 - added cmin, xmin test case to one of the parallel insert cases
> to ensure leader and worker insert the tuples in the same xact and replaced
> memory usage output in numbers like 25kB to NkB to make the tests stable.
> 4) 0004 - updated one of the test output to be in NkB and made the assertion
> in SetParallelInsertState to be not under an if condition.
>
> There's one open point [1] on selective skipping of error "cannot insert
> tuples in a parallel worker" in heap_prepare_insert(), thoughts are
> welcome.
>
> Please consider the v21 patch set for further review.

Hi,

I took a look into the new patch and have some comments.

1.
+ /*
+ * Do not consider tuple cost in case of we intend to perform parallel
+ * inserts by workers. We would have turned on the ignore flag in
+ * apply_scanjoin_target_to_paths before generating Gather path for the
+ * upper level SELECT part of the query.
+ */
+ if ((root->parse->parallelInsCmdTupleCostOpt &
+ PARALLEL_INSERT_SELECT_QUERY) &&
+ (root->parse->parallelInsCmdTupleCostOpt &
+ PARALLEL_INSERT_CAN_IGN_TUP_COST))

Can we just check PARALLEL_INSERT_CAN_IGN_TUP_COST here ?
IMO, PARALLEL_INSERT_CAN_IGN_TUP_COST will be set only when PARALLEL_INSERT_SELECT_QUERY is set.

2.
+static void
+ParallelInsCmdEstimate(ParallelContext *pcxt, ParallelInsertCmdKind ins_cmd,
+ void *ins_info)
...
+ info = (ParallelInsertCTASInfo *) ins_info;
+ intoclause_str = nodeToString(info->intoclause);
+ intoclause_len = strlen(intoclause_str) + 1;

+static void
+SaveParallelInsCmdInfo(ParallelContext *pcxt, ParallelInsertCmdKind ins_cmd,
+ void *ins_info)
...
+ info = (ParallelInsertCTASInfo *)ins_info;
+ intoclause_str = nodeToString(info->intoclause);
+ intoclause_len = strlen(intoclause_str) + 1;
+ intoclause_space = shm_toc_allocate(pcxt->toc, intoclause_len);

I noticed the above code will call nodeToString and strlen twice which seems unnecessary.
Do you think it's better to store the result of nodetostring and strlen first and pass them when used ?

3.
+ if (node->need_to_scan_locally || node->nworkers_launched == 0)
+ {
+ EState *estate = node->ps.state;
+ TupleTableSlot *outerTupleSlot;
+
+ for(;;)
+ {
+ /* Install our DSA area while executing the plan. */
+ estate->es_query_dsa =
+ node->pei ? node->pei->area : NULL;
...
+ node->ps.state->es_processed++;
+ }

How about use the variables estate like 'estate-> es_processed++;'
Instead of node->ps.state->es_processed++;

Best regards,
houzj

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2021-01-11 02:34:53 Re: pg_preadv() and pg_pwritev()
Previous Message tsunakawa.takay@fujitsu.com 2021-01-11 00:14:04 RE: Disable WAL logging to speed up data loading