Re: Parallel Inserts in CREATE TABLE AS

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: "Hou, Zhijie" <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Zhihong Yu <zyu(at)yugabyte(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>
Subject: Re: Parallel Inserts in CREATE TABLE AS
Date: 2021-01-11 03:21:46
Message-ID: CALj2ACUK58u-pQXmJjaF97b20v=ugppiwAiwZtgLFXn7SNSY8Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jan 11, 2021 at 6:37 AM Hou, Zhijie <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com> wrote:
> > Attaching v21 patch set, which has following changes:
> > 1) 0001 - changed fpes->ins_cmd_type ==
> > PARALLEL_INSERT_CMD_CREATE_TABLE_AS to fpes->ins_cmd_type !=
> > PARALLEL_INSERT_CMD_UNDEF
> > 2) 0002 - reworded the commit message.
> > 3) 0003 - added cmin, xmin test case to one of the parallel insert cases
> > to ensure leader and worker insert the tuples in the same xact and replaced
> > memory usage output in numbers like 25kB to NkB to make the tests stable.
> > 4) 0004 - updated one of the test output to be in NkB and made the assertion
> > in SetParallelInsertState to be not under an if condition.
> >
> > There's one open point [1] on selective skipping of error "cannot insert
> > tuples in a parallel worker" in heap_prepare_insert(), thoughts are
> > welcome.
> >
> > Please consider the v21 patch set for further review.
>
> Hi,
>
> I took a look into the new patch and have some comments.

Thanks.

> 1.
> + /*
> + * Do not consider tuple cost in case of we intend to perform parallel
> + * inserts by workers. We would have turned on the ignore flag in
> + * apply_scanjoin_target_to_paths before generating Gather path for the
> + * upper level SELECT part of the query.
> + */
> + if ((root->parse->parallelInsCmdTupleCostOpt &
> + PARALLEL_INSERT_SELECT_QUERY) &&
> + (root->parse->parallelInsCmdTupleCostOpt &
> + PARALLEL_INSERT_CAN_IGN_TUP_COST))
>
> Can we just check PARALLEL_INSERT_CAN_IGN_TUP_COST here ?
> IMO, PARALLEL_INSERT_CAN_IGN_TUP_COST will be set only when PARALLEL_INSERT_SELECT_QUERY is set.

+1. Changed.

> 2.
> +static void
> +ParallelInsCmdEstimate(ParallelContext *pcxt, ParallelInsertCmdKind ins_cmd,
> + void *ins_info)
> ...
> + info = (ParallelInsertCTASInfo *) ins_info;
> + intoclause_str = nodeToString(info->intoclause);
> + intoclause_len = strlen(intoclause_str) + 1;
>
> +static void
> +SaveParallelInsCmdInfo(ParallelContext *pcxt, ParallelInsertCmdKind ins_cmd,
> + void *ins_info)
> ...
> + info = (ParallelInsertCTASInfo *)ins_info;
> + intoclause_str = nodeToString(info->intoclause);
> + intoclause_len = strlen(intoclause_str) + 1;
> + intoclause_space = shm_toc_allocate(pcxt->toc, intoclause_len);
>
> I noticed the above code will call nodeToString and strlen twice which seems unnecessary.
> Do you think it's better to store the result of nodetostring and strlen first and pass them when used ?

I wanted to keep the API generic, not do nodeToString, strlen outside
and pass it to the APIs. I don't think it will add too much function
call cost since it's run only in the leader. This way, the code and
API looks more readable. Thoughts?

> 3.
> + if (node->need_to_scan_locally || node->nworkers_launched == 0)
> + {
> + EState *estate = node->ps.state;
> + TupleTableSlot *outerTupleSlot;
> +
> + for(;;)
> + {
> + /* Install our DSA area while executing the plan. */
> + estate->es_query_dsa =
> + node->pei ? node->pei->area : NULL;
> ...
> + node->ps.state->es_processed++;
> + }
>
> How about use the variables estate like 'estate-> es_processed++;'
> Instead of node->ps.state->es_processed++;

+1. Changed.

Attaching v22 patch set with changes only in 0001 and 0002. Please
consider it for further review.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
v22-0001-Parallel-Inserts-in-CREATE-TABLE-AS.patch application/x-patch 36.3 KB
v22-0002-Tuple-Cost-Adjustment-for-Parallel-Inserts-in-CTAS.patch application/x-patch 15.4 KB
v22-0003-Tests-And-Docs-For-Parallel-Inserts-in-CTAS.patch application/x-patch 30.4 KB
v22-0004-Enable-CTAS-Parallel-Inserts-For-Append.patch application/x-patch 46.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2021-01-11 03:35:56 O(n^2) system calls in RemoveOldXlogFiles()
Previous Message Thomas Munro 2021-01-11 02:59:42 Re: pg_preadv() and pg_pwritev()