Re: Parallel Inserts in CREATE TABLE AS

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Zhihong Yu <zyu(at)yugabyte(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, "Hou, Zhijie" <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Inserts in CREATE TABLE AS
Date: 2020-12-28 12:14:16
Message-ID: CALj2ACX+i-vtmdrS3kzLVJx=3TQLXryoF3zWWgZ-8Qvo3RJyFw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Dec 28, 2020 at 1:16 AM Zhihong Yu <zyu(at)yugabyte(dot)com> wrote:
> For v16-0002-Tuple-Cost-Adjustment-for-Parallel-Inserts-in-CTAS.patch:
>
> + if (ignore &&
> + (root->parse->CTASParallelInsInfo &
> + CTAS_PARALLEL_INS_TUP_COST_CAN_IGN))
>
> I wonder why CTAS_PARALLEL_INS_TUP_COST_CAN_IGN is checked again in the above if since when ignore_parallel_tuple_cost returns true, CTAS_PARALLEL_INS_TUP_COST_CAN_IGN is set already.

Sometimes, we may set the flag CTAS_PARALLEL_INS_TUP_COST_CAN_IGN
before generate_useful_gather_paths, but the
generate_useful_gather_paths can return without reaching cost_gather
where we reset. The generate_useful_gather_paths can return without
reaching cost_gather, in following case

if (rel->partial_pathlist == NIL)
return;

So, for such cases, I'm resetting it here.

> + * In this function we only care Append and Gather nodes.
>
> 'care' -> 'care about'

Done.

> + for (int i = 0; i < aps->as_nplans; i++)
> + {
> + parallel |= PushDownCTASParallelInsertState(dest,
> + aps->appendplans[i],
> + gather_exists);
>
> It seems the loop termination condition can include parallel since we can come out of the loop once parallel is true.

No, we can not come out of the for loop if parallel is true, because
our intention there is to look for all the child/sub plans under
Append, and push the inserts to the Gather nodes wherever possible.

> + if (!allow && tuple_cost_flags && gather_exists)
>
> As the above code shows, gather_exists is only checked when allow is false.

Yes, if at least one gather node exists under the Append for which the
planner would have ignored the tuple cost, and now if we don't allow
parallel inserts, we should assert that the parallelism is not picked
because of wrong parallel tuple cost enforcement.

> + * We set the flag for two cases when there is no parent path will
> + * be created(such as : limit,sort,distinct...):
>
> Please correct the grammar : there are two verbs following 'when'

Done.

> For set_append_rel_size:
>
> + {
> + root->parse->CTASParallelInsInfo |=
> + CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND;
> + }
> + }
> +
> + if (root->parse->CTASParallelInsInfo &
> + CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND)
> + {
> + root->parse->CTASParallelInsInfo &=
> + ~CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND;
>
> In the if block for childrel->rtekind == RTE_SUBQUERY, CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND maybe set. Why is it cleared immediately after ?

Thanks for pointing that out. It's a miss, intention is to reset it
after set_rel_size(). Corrected in the v17 patch.

> + /* Set to this in case tuple cost needs to be ignored for Append cases. */
> + CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND = 1 << 3
>
> Since each CTAS_PARALLEL_INS_ flag is a bit, maybe it's better to use 'turn on' or similar term in the comment. Because 'set to' normally means assignment.

Done.

All the above comments are addressed in the v17 patch set posted
upthread. Please have a look.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2020-12-28 12:15:20 Re: Parallel Inserts in CREATE TABLE AS
Previous Message Bharath Rupireddy 2020-12-28 12:12:25 Re: Parallel Inserts in CREATE TABLE AS