Re: Parallel Inserts in CREATE TABLE AS

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: "Hou, Zhijie" <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com>
Cc: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Zhihong Yu <zyu(at)yugabyte(dot)com>
Subject: Re: Parallel Inserts in CREATE TABLE AS
Date: 2020-12-07 09:26:37
Message-ID: CAA4eK1KdsYKdzGXocEkNoZ=meExf9NZA7DdoBZd323605b-YaQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Dec 7, 2020 at 11:32 AM Hou, Zhijie <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com> wrote:
>
> Hi
>
> + /*
> + * Flag to let the planner know that the SELECT query is for CTAS. This is
> + * used to calculate the tuple transfer cost from workers to gather node(in
> + * case parallelism kicks in for the SELECT part of the CTAS), to zero as
> + * each worker will insert its share of tuples in parallel.
> + */
> + if (IsParallelInsertInCTASAllowed(into, NULL))
> + query->isForCTAS = true;
>
>
> + /*
> + * We do not compute the parallel_tuple_cost for CTAS because the number of
> + * tuples that are transferred from workers to the gather node is zero as
> + * each worker, in parallel, inserts the tuples that are resulted from its
> + * chunk of plan execution. This change may make the parallel plan cheap
> + * among all other plans, and influence the planner to consider this
> + * parallel plan.
> + */
> + if (!(root->parse->isForCTAS &&
> + root->query_level == 1))
> + run_cost += parallel_tuple_cost * path->path.rows;
>
> I noticed that the parallel_tuple_cost will still be ignored,
> When Gather is not the top node.
>
> Example:
> Create table test(i int);
> insert into test values(generate_series(1,10000000,1));
> explain create table ntest3 as select * from test where i < 200 limit 10000;
>
> QUERY PLAN
> -------------------------------------------------------------------------------
> Limit (cost=1000.00..97331.33 rows=1000 width=4)
> -> Gather (cost=1000.00..97331.33 rows=1000 width=4)
> Workers Planned: 2
> -> Parallel Seq Scan on test (cost=0.00..96331.33 rows=417 width=4)
> Filter: (i < 200)
>
>
> The isForCTAS will be true because [create table as], the
> query_level is always 1 because there is no subquery.
> So even if gather is not the top node, parallel cost will still be ignored.
>
> Is that works as expected ?
>

I don't think that is expected and is not the case without this patch.
The cost shouldn't be changed for existing cases where the write is
not pushed to workers.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hou, Zhijie 2020-12-07 09:30:26 RE: Parallel copy
Previous Message Greg Nancarrow 2020-12-07 09:04:59 Re: Parallel INSERT (INTO ... SELECT ...)