Re: Parallel Inserts in CREATE TABLE AS

From: Luc Vlaming <luc(at)swarm64(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Zhihong Yu <zyu(at)yugabyte(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, "Hou, Zhijie" <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Inserts in CREATE TABLE AS
Date: 2021-01-04 10:52:29
Message-ID: 04c52d1d-7e67-90dd-1cdc-a84b09229c9b@swarm64.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 30-12-2020 04:55, Bharath Rupireddy wrote:
> On Wed, Dec 30, 2020 at 5:22 AM Zhihong Yu <zyu(at)yugabyte(dot)com> wrote:
>> w.r.t. v17-0004-Enable-CTAS-Parallel-Inserts-For-Append.patch
>>
>> + * Push the dest receiver to Gather node when it is either at the top of the
>> + * plan or under top Append node unless it does not have any projections to do.
>>
>> I think the 'unless' should be 'if'. As can be seen from the body of the method:
>>
>> + if (!ps->ps_ProjInfo)
>> + {
>> + GatherState *gstate = (GatherState *) ps;
>> +
>> + parallel = true;
>
> Thanks. Modified it in the 0004 patch. Attaching v18 patch set. Note
> that no change in 0001 to 0003 patches from v17.
>
> Please consider v18 patch set for further review.
>
> With Regards,
> Bharath Rupireddy.
> EnterpriseDB: http://www.enterprisedb.com
>

Hi,

Sorry it took so long to get back to reviewing this.

wrt v18-0001....patch:

+ /*
+ * If the worker is for parallel insert in CTAS, then use the proper
+ * dest receiver.
+ */
+ intoclause = (IntoClause *) stringToNode(intoclausestr);
+ receiver = CreateIntoRelDestReceiver(intoclause);
+ ((DR_intorel *)receiver)->is_parallel_worker = true;
+ ((DR_intorel *)receiver)->object_id = fpes->objectid;
I would move this into a function called e.g.
GetCTASParallelWorkerReceiver so that the details wrt CTAS can be put in
createas.c.
I would then also split up intorel_startup into intorel_leader_startup
and intorel_worker_startup, and in GetCTASParallelWorkerReceiver set
self->pub.rStartup to intorel_worker_startup.

+ volatile pg_atomic_uint64 *processed;
why is it volatile?

+ if (isctas)
+ {
+ intoclause = ((DR_intorel *) node->dest)->into;
+ objectid = ((DR_intorel *) node->dest)->object_id;
+ }
Given that you extract them each once and then pass them directly into
the parallel-worker, can't you instead pass in the destreceiver and
leave that logic to ExecInitParallelPlan?

+ if (IS_PARALLEL_CTAS_DEST(gstate->dest) &&
+ ((DR_intorel *) gstate->dest)->into->rel &&
+ ((DR_intorel *) gstate->dest)->into->rel->relname)
why would rel and relname not be there? if no rows have been inserted?
because it seems from the intorel_startup function that that would be
set as soon as startup was done, which i assume (wrongly?) is always done?

+ * In case if no workers were launched, allow the leader to insert entire
+ * tuples.
what does "entire tuples" mean? should it maybe be "all tuples"?

================
wrt v18-0002....patch:

It looks like this introduces a state machine that goes like:
- starts at CTAS_PARALLEL_INS_UNDEF
- possibly moves to CTAS_PARALLEL_INS_SELECT
- CTAS_PARALLEL_INS_TUP_COST_CAN_IGN can be added
- if both were added at some stage, we can go to
CTAS_PARALLEL_INS_TUP_COST_IGNORED and ignore the costs

what i'm wondering is why you opted to put logic around
generate_useful_gather_paths and in cost_gather when to me it seems more
logical to put it in create_gather_path? i'm probably missing something
there?

================
wrt v18-0003....patch:

not sure if it is needed, but i was wondering if we would want more
tests with multiple gather nodes existing? caused e.g. by using CTE's,
valid subquery's (like the one test you have, but without the group
by/having)?

Kind regards,
Luc

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2021-01-04 10:55:40 Re: {CREATE INDEX, REINDEX} CONCURRENTLY improvements
Previous Message Daniil Zakhlystov 2021-01-04 10:17:50 Re: zstd compression for pg_dump