Re: Parallel Inserts in CREATE TABLE AS

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Hou, Zhijie" <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Zhihong Yu <zyu(at)yugabyte(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>
Subject: Re: Parallel Inserts in CREATE TABLE AS
Date: 2020-12-30 09:34:14
Message-ID: CAFiTN-u_PEEYke66jMjLdXrcwgnmg-0nUG5oG-XtOqdQ1_vYVw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Dec 30, 2020 at 10:49 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> On Wed, 30 Dec 2020 at 10:47 AM, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>>
>> On Wed, Dec 30, 2020 at 10:32 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>> > I have completed reviewing 0001, I don't have more comments, just one
>> > question. Soon I will review the remaining patches.
>>
>> Thanks.
>>
>> > + /* If parallel inserts are to be allowed, set a few extra information. */
>> > + if (myState->is_parallel)
>> > + {
>> > + myState->object_id = intoRelationAddr.objectId;
>> > +
>> > + /*
>> > + * We don't need to skip contacting FSM while inserting tuples for
>> > + * parallel mode, while extending the relations, workers instead of
>> > + * blocking on a page while another worker is inserting, can check the
>> > + * FSM for another page that can accommodate the tuples. This results
>> > + * in major benefit for parallel inserts.
>> > + */
>> > + myState->ti_options = 0;
>> >
>> > Is there any performance data for this or just theoretical analysis?
>>
>> I have seen that we don't get much performance with the skip fsm
>> option, though I don't have the data to back it up. I'm planning to
>> run performance tests after the patches 0001, 0002 and 0003 get
>> reviewed. I will capture the data at that time. Hope that's fine.
>
>
> Yeah that’s fine
>

Some comments in 0002

1.
+/*
+ * Information sent to the planner from CTAS to account for the cost
+ * calculations in cost_gather. We need to do this because, no tuples will be
+ * received by the Gather node if the workers insert the tuples in parallel.
+ */
+typedef enum CTASParallelInsertOpt
+{
+ CTAS_PARALLEL_INS_UNDEF = 0, /* undefined */
+ CTAS_PARALLEL_INS_SELECT = 1 << 0, /* turn on this before planning */
+ /*
+ * Turn on this while planning for upper Gather path to ignore parallel
+ * tuple cost in cost_gather.
+ */
+ CTAS_PARALLEL_INS_TUP_COST_CAN_IGN = 1 << 1,
+ /* Turn on this after the cost is ignored. */
+ CTAS_PARALLEL_INS_TUP_COST_IGNORED = 1 << 2
+} CTASParallelInsertOpt;

I don't like the naming of these flags. Especially no need to define
CTAS_PARALLEL_INS_UNDEF, we can directl use 0
for that purpose instead of giving some weird name. So I suggest
first, just get rid of CTAS_PARALLEL_INS_UNDEF.

2.
+ /*
+ * Turn on a flag to ignore parallel tuple cost by the Gather path in
+ * cost_gather if the SELECT is for CTAS and we are generating an upper
+ * level Gather path.
+ */
+ bool ignore = ignore_parallel_tuple_cost(root);
+
generate_useful_gather_paths(root, rel, false);

+ /*
+ * Reset the ignore flag, in case we turned it on but
+ * generate_useful_gather_paths returned without reaching cost_gather.
+ * If we reached cost_gather, we would have been reset it there.
+ */
+ if (ignore && (root->parse->CTASParallelInsInfo &
+ CTAS_PARALLEL_INS_TUP_COST_CAN_IGN))
+ {
+ root->parse->CTASParallelInsInfo &=
+ ~CTAS_PARALLEL_INS_TUP_COST_CAN_IGN;
+ }

I think th way we are using these cost ignoring flag, doesn't look clean.

I mean first, CTAS_PARALLEL_INS_SELECT is set if it is coming from
CTAS and then ignore_parallel_tuple_cost will
set the CTAS_PARALLEL_INS_TUP_COST_CAN_IGN if it satisfies certain
condition which is fine. Now, internally cost
gather will add CTAS_PARALLEL_INS_TUP_COST_IGNORED and remove
CTAS_PARALLEL_INS_TUP_COST_CAN_IGN and if
CTAS_PARALLEL_INS_TUP_COST_CAN_IGN is not removed then we will remove
it outside. Why do we need to remove
CTAS_PARALLEL_INS_TUP_COST_CAN_IGN flag at all?

3.
+ if (tuple_cost_flags && gstate->ps.ps_ProjInfo)
+ Assert(!(*tuple_cost_flags & CTAS_PARALLEL_INS_TUP_COST_IGNORED));

Instead of adding Assert inside an IF statement, you can convert whole
statement as an assert. Lets not add unnecessary
if in the release mode.

4.
+ if ((root->parse->CTASParallelInsInfo & CTAS_PARALLEL_INS_SELECT) &&
+ (root->parse->CTASParallelInsInfo &
+ CTAS_PARALLEL_INS_TUP_COST_CAN_IGN))
+ {
+ ignore_tuple_cost = true;
+ root->parse->CTASParallelInsInfo &=
+ ~CTAS_PARALLEL_INS_TUP_COST_CAN_IGN;
+ root->parse->CTASParallelInsInfo |= CTAS_PARALLEL_INS_TUP_COST_IGNORED;
+ }
+
+ if (!ignore_tuple_cost)
+ run_cost += parallel_tuple_cost * path->path.rows;

Changes this to (if, else) as shown below, because if it goes to the
IF part then ignore_tuple_cost will always be true
so no need to have an extra if check.

if ((root->parse->CTASParallelInsInfo & CTAS_PARALLEL_INS_SELECT) &&
(root->parse->CTASParallelInsInfo &
CTAS_PARALLEL_INS_TUP_COST_CAN_IGN))
{
ignore_tuple_cost = true;
root->parse->CTASParallelInsInfo &=
~CTAS_PARALLEL_INS_TUP_COST_CAN_IGN;
root->parse->CTASParallelInsInfo |= CTAS_PARALLEL_INS_TUP_COST_IGNORED;
}
else
run_cost += parallel_tuple_cost * path->path.rows;

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Fetter 2020-12-30 10:55:36 popcount
Previous Message Denis Smirnov 2020-12-30 09:12:22 Re: PoC Refactor AM analyse API