Re: Parallel Inserts in CREATE TABLE AS

From: Luc Vlaming <luc(at)swarm64(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Zhihong Yu <zyu(at)yugabyte(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, "Hou, Zhijie" <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Inserts in CREATE TABLE AS
Date: 2021-01-05 07:30:37
Message-ID: a18042e1-9340-5df5-835f-1271b7a444f2@swarm64.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 05-01-2021 04:59, Bharath Rupireddy wrote:
> On Mon, Jan 4, 2021 at 7:02 PM Bharath Rupireddy
> <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>>
>>> + if (IS_PARALLEL_CTAS_DEST(gstate->dest) &&
>>> + ((DR_intorel *) gstate->dest)->into->rel &&
>>> + ((DR_intorel *) gstate->dest)->into->rel->relname)
>>> why would rel and relname not be there? if no rows have been inserted?
>>> because it seems from the intorel_startup function that that would be
>>> set as soon as startup was done, which i assume (wrongly?) is always done?
>>
>> Actually, that into clause rel variable is always being set in the gram.y for CTAS, Create Materialized View and SELECT INTO (because qualified_name non-terminal is not optional). My bad. I just added it as a sanity check. Actually, it's not required.
>>
>> create_as_target:
>> qualified_name opt_column_list table_access_method_clause
>> OptWith OnCommitOption OptTableSpace
>> {
>> $$ = makeNode(IntoClause);
>> $$->rel = $1;
>> create_mv_target:
>> qualified_name opt_column_list table_access_method_clause opt_reloptions OptTableSpace
>> {
>> $$ = makeNode(IntoClause);
>> $$->rel = $1;
>> into_clause:
>> INTO OptTempTableName
>> {
>> $$ = makeNode(IntoClause);
>> $$->rel = $2;
>>
>> I will change the below code:
>> + if (GetParallelInsertCmdType(gstate->dest) ==
>> + PARALLEL_INSERT_CMD_CREATE_TABLE_AS &&
>> + ((DR_intorel *) gstate->dest)->into &&
>> + ((DR_intorel *) gstate->dest)->into->rel &&
>> + ((DR_intorel *) gstate->dest)->into->rel->relname)
>> + {
>>
>> to:
>> + if (GetParallelInsertCmdType(gstate->dest) ==
>> + PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
>> + {
>>
>> I will update this in the next version of the patch set.
>
> Attaching v20 patch set that has above change in 0001 patch, note that
> 0002 to 0004 patches have no changes from v19. Please consider the v20
> patch set for further review.
>
>
> With Regards,
> Bharath Rupireddy.
> EnterpriseDB: http://www.enterprisedb.com
>

Hi,

Reviewing further v20-0001:

I would still opt for moving the code for the parallel worker into a
separate function, and then setting rStartup of the dest receiver to
that function in ExecParallelGetInsReceiver, as its completely
independent code. Just a matter of style I guess.

Maybe I'm not completely following why but afaics we want parallel
inserts in various scenarios, not just CTAS? I'm asking because code like
+ if (fpes->ins_cmd_type == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ pg_atomic_add_fetch_u64(&fpes->processed,
queryDesc->estate->es_processed);
seems very specific to CTAS. For now that seems fine but I suppose that
would be generalized soon after? Basically I would have expected the if
to compare against PARALLEL_INSERT_CMD_UNDEF.

Apart from these small things v20-0001 looks (very) good to me.

v20-0002:
will reply on the specific mail-thread about the state machine

v20-0003 and v20-0004:
looks good to me.

Kind regards,
Luc

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2021-01-05 07:54:11 Re: Moving other hex functions to /common
Previous Message Michael Paquier 2021-01-05 07:17:34 Re: pg_rewind restore_command issue in PG12