Re: Parallel Inserts in CREATE TABLE AS

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Luc Vlaming <luc(at)swarm64(dot)com>
Cc: Zhihong Yu <zyu(at)yugabyte(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, "Hou, Zhijie" <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Inserts in CREATE TABLE AS
Date: 2021-01-04 13:32:43
Message-ID: CALj2ACVufcSF4KG5M7g+hP91S+FeOrhO4U8v=B_tFk8on9ufZA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jan 4, 2021 at 4:22 PM Luc Vlaming <luc(at)swarm64(dot)com> wrote:
> Sorry it took so long to get back to reviewing this.

Thanks for the comments.

> wrt v18-0001....patch:
>
> + /*
> + * If the worker is for parallel insert in CTAS, then use
the proper
> + * dest receiver.
> + */
> + intoclause = (IntoClause *) stringToNode(intoclausestr);
> + receiver = CreateIntoRelDestReceiver(intoclause);
> + ((DR_intorel *)receiver)->is_parallel_worker = true;
> + ((DR_intorel *)receiver)->object_id = fpes->objectid;
> I would move this into a function called e.g.
> GetCTASParallelWorkerReceiver so that the details wrt CTAS can be put in
> createas.c.
> I would then also split up intorel_startup into intorel_leader_startup
> and intorel_worker_startup, and in GetCTASParallelWorkerReceiver set
> self->pub.rStartup to intorel_worker_startup.

My intention was to not add any new APIs to the dest receiver. I simply
made the changes in intorel_startup, in which for workers it just does the
minimalistic work and exit from it. In the leader most of the table
creation and sanity check is kept untouched. Please have a look at the v19
patch posted upthread [1].

> + volatile pg_atomic_uint64 *processed;
> why is it volatile?

Intention is to always read from the actual memory location. I referred it
from the way pg_atomic_fetch_add_u64_impl,
pg_atomic_compare_exchange_u64_impl, pg_atomic_init_u64_impl and their u32
counterparts use pass the parameter as volatile pg_atomic_uint64 *ptr.

> + if (isctas)
> + {
> + intoclause = ((DR_intorel *)
node->dest)->into;
> + objectid = ((DR_intorel *)
node->dest)->object_id;
> + }
> Given that you extract them each once and then pass them directly into
> the parallel-worker, can't you instead pass in the destreceiver and
> leave that logic to ExecInitParallelPlan?

That's changed entirely in the v19 patch set posted upthread [1]. Please
have a look. I didn't pass the dest receiver, to keep the API generic, I
passed parallel insert command type and a void * ptr which points to
insertion command because the information we pass to workers depends on the
insertion command (for instance, the information needed by workers is for
CTAS into clause and object id and for Refresh Mat View object id).

>
> + if
(IS_PARALLEL_CTAS_DEST(gstate->dest) &&
> + ((DR_intorel *)
gstate->dest)->into->rel &&
> + ((DR_intorel *)
gstate->dest)->into->rel->relname)
> why would rel and relname not be there? if no rows have been inserted?
> because it seems from the intorel_startup function that that would be
> set as soon as startup was done, which i assume (wrongly?) is always done?

Actually, that into clause rel variable is always being set in the gram.y
for CTAS, Create Materialized View and SELECT INTO (because qualified_name
non-terminal is not optional). My bad. I just added it as a sanity check.
Actually, it's not required.

create_as_target:
*qualified_name* opt_column_list table_access_method_clause
OptWith OnCommitOption OptTableSpace
{
$$ = makeNode(IntoClause);
* $$->rel = $1;*
create_mv_target:
*qualified_name* opt_column_list table_access_method_clause
opt_reloptions OptTableSpace
{
$$ = makeNode(IntoClause);
* $$->rel = $1;*
into_clause:
INTO OptTempTableName
{
$$ = makeNode(IntoClause);
* $$->rel = $2;*

I will change the below code:
+ if (GetParallelInsertCmdType(gstate->dest) ==
+ PARALLEL_INSERT_CMD_CREATE_TABLE_AS &&
+ ((DR_intorel *) gstate->dest)->into &&
+ ((DR_intorel *) gstate->dest)->into->rel &&
+ ((DR_intorel *) gstate->dest)->into->rel->relname)
+ {

to:
+ if (GetParallelInsertCmdType(gstate->dest) ==
+ PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ {

I will update this in the next version of the patch set.

>
> + * In case if no workers were launched, allow the leader to
insert entire
> + * tuples.
> what does "entire tuples" mean? should it maybe be "all tuples"?

Yeah, noticed that while working on the v19 patch set. Please have a look
at the v19 patch posted upthread [1].

> ================
> wrt v18-0003....patch:
>
> not sure if it is needed, but i was wondering if we would want more
> tests with multiple gather nodes existing? caused e.g. by using CTE's,
> valid subquery's (like the one test you have, but without the group
> by/having)?

I'm not sure if we can have CTAS/CMV/SELECT INTO in CTEs like WITH, WITH
RECURSIVE and I don't see that any of the WITH clause processing hits
createas.c functions. So, IMHO, we don't need to add them. Please let me
know if there are any specific use cases you have in mind.

For instance, I tried to cover Init/Sub Plan and Subquery cases with:

below case has multiple Gather, Init Plan:
+-- parallel inserts must occur, as there is init plan that gets executed by
+-- each parallel worker
+select explain_pictas(
+'create table parallel_write as select two col1,
+ (select two from (select * from tenk2) as tt limit 1) col2
+ from tenk1 where tenk1.four = 3;');

below case has Gather, Sub Plan:
+-- parallel inserts must not occur, as there is sub plan that gets
executed by
+-- the Gather node in leader
+select explain_pictas(
+'create table parallel_write as select two col1,
+ (select tenk1.two from generate_series(1,1)) col2
+ from tenk1 where tenk1.four = 3;');

For multiple Gather node cases, I covered them with the Union All/Append
cases in the 0004 patch. Please have a look.

[1] -
https://www.postgresql.org/message-id/CALj2ACWth7mVQtqdYJwSn1mNmaHwxNE7YSYxRSLmfkqxRk%2Bzmg%40mail.gmail.com

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Önder Kalacı 2021-01-04 13:37:39 Re: row filtering for logical replication
Previous Message Joel Jacobson 2021-01-04 13:30:02 Re: Test harness for regex code (to allow importing Tcl's test suite)