Re: Parallel Inserts in CREATE TABLE AS

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: vignesh C <vignesh21(at)gmail(dot)com>, "Hou, Zhijie" <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Zhihong Yu <zyu(at)yugabyte(dot)com>
Subject: Re: Parallel Inserts in CREATE TABLE AS
Date: 2020-12-27 08:50:11
Message-ID: CALj2ACUgkddMDLMtUMHzof_=4Vbx9uyYut+_XUJ+uSOUb1LeMw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Dec 26, 2020 at 11:11 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> I have reviewed part of v15-0001 patch, I have a few comments, I will
> continue to review this.

Thanks a lot.

> 1.
> Why is this temporary hack? and what is the plan for removing this hack?

The changes in xact.c, xact.h and heapam.c are common to all the
parallel insert patches - COPY, INSERT INTO SELECT. That was the
initial comment, I forgot to keep it in sync with the other patches.
Now, I used the comment from INSERT INTO SELECT patch. IIRC, the plan
was to have these code in all the parallel inserts patch, whichever
gets to review and commit first, others will update their patches
accordingly.

> 2.
> +/*
> + * ChooseParallelInsertsInCTAS --- determine whether or not parallel
> + * insertion is possible, if yes set the parallel insert state i.e. push down
> + * the dest receiver to the Gather nodes.
> + */
> +void ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc)
> +{
> + if (!IS_CTAS(into))
> + return;
>
> When will this hit? The functtion name suggest that it is from CTAS
> but now you have a check that if it is
> not for CTAS then return, can you add the comment that when do you
> expect this case?

Yes it will hit for explain cases, but I choose to remove this and
check outside in the explain something like:
if (into)
ChooseParallelInsertsInCTAS()

> Also the function name should start in a new line
> i.e
> void
> ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc)

Ah, missed that. Modified now.

> 3.
> +/*
> + * ChooseParallelInsertsInCTAS --- determine whether or not parallel
> + * insertion is possible, if yes set the parallel insert state i.e. push down
> + * the dest receiver to the Gather nodes.
> + */
>
> Push down to the Gather nodes? I think the right statement will be
> push down below the Gather node.

Modified.

> 4.
> intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
> {
> DR_intorel *myState = (DR_intorel *) self;
>
> -- Comment ->in parallel worker we don't need to crease dest recv blah blah
> + if (myState->is_parallel_worker)
> {
> --parallel worker handling--
> return;
> }
>
> --non-parallel worker code stay right there, instead of moving to else

Done.

> 5.
> +/*
> + * ChooseParallelInsertsInCTAS --- determine whether or not parallel
> + * insertion is possible, if yes set the parallel insert state i.e. push down
> + * the dest receiver to the Gather nodes.
> + */
> +void ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc)
> +{
>
> From function name and comments it appeared that this function will
> return boolean saying whether
> Parallel insert should be selected or not. I think name/comment
> should be better for this

Yeah that function can still return void because no point in returning
bool there, since the intention is to see if parallel inserts can be
performed, if yes, set the state otherwise exit. I changed the
function name to TryParallelizingInsertsInCTAS(). Let me know your
suggestions if that doesn't work out.

> 6.
> /*
> + * For parallelizing inserts in CTAS i.e. making each parallel worker
> + * insert the tuples, we must send information such as into clause (for
> + * each worker to build separate dest receiver), object id (for each
> + * worker to open the created table).
>
> Comment is saying we need to pass object id but the code under this
> comment is not doing so.

Improved the comment.

> 7.
> + /*
> + * Since there are no rows that are transferred from workers to Gather
> + * node, so we set it to 0 to be visible in estimated row count of
> + * explain plans.
> + */
> + queryDesc->planstate->plan->plan_rows = 0;
>
> This seems a bit hackies Why it is done after the planning, I mean
> plan must know that it is returning a 0 rows?

This exists to show up the estimated row count(in case of EXPLAIN CTAS
without ANALYZE) in the output. For EXPLAIN ANALYZE CTAS actual tuples
are shown correctly as 0 because Gather doesn't receive any tuples.
if (es->costs)
{
if (es->format == EXPLAIN_FORMAT_TEXT)
{
appendStringInfo(es->str, " (cost=%.2f..%.2f rows=%.0f width=%d)",
plan->startup_cost, plan->total_cost,
plan->plan_rows, plan->plan_width);

Since it's an estimated row count(which may not be always correct), we
will let the EXPLAIN plan show that and I think we can remove that
part. Thoughts?

I removed it in v6 patch set.

> 8.
> + char *intoclause_space = shm_toc_allocate(pcxt->toc,
> + intoclause_len);
> + memcpy(intoclause_space, intoclausestr, intoclause_len);
> + shm_toc_insert(pcxt->toc, PARALLEL_KEY_INTO_CLAUSE, intoclause_space);
>
> One blank line between variable declaration and next code segment,
> take care at other places as well.

Done.

I'm attaching the v16 patch set. Please note that I added the
documentation saying that parallel insertions can happen and a sample
output of the explain to 0003 patch as discussed in [1]. But I didn't
move the explain output related code to a separate patch because it's
a small snippet in explain.c. I hope that's okay.

[1] - https://www.postgresql.org/message-id/CAA4eK1JqwXGYoGa1%2B3-f0T50dBGufvKaKQOee_AfFhygZ6QKtA%40mail.gmail.com

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
v16-0001-Parallel-Inserts-in-CREATE-TABLE-AS.patch application/octet-stream 27.3 KB
v16-0002-Tuple-Cost-Adjustment-for-Parallel-Inserts-in-CTAS.patch application/octet-stream 12.6 KB
v16-0003-Tests-And-Docs-For-Parallel-Inserts-in-CTAS.patch application/octet-stream 29.9 KB
v16-0004-Enable-CTAS-Parallel-Inserts-For-Append.patch application/octet-stream 44.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2020-12-27 08:57:54 Re: Parallel Inserts in CREATE TABLE AS
Previous Message Michael Paquier 2020-12-27 08:48:47 Re: pgsql: Add pg_alterckey utility to change the cluster key