Re: Parallel Inserts in CREATE TABLE AS

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: "Hou, Zhijie" <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Zhihong Yu <zyu(at)yugabyte(dot)com>
Subject: Re: Parallel Inserts in CREATE TABLE AS
Date: 2020-12-07 13:34:24
Message-ID: CALj2ACUQS=fyAgj29hLCVh=jw0k3oEZHSG=WWgFx6UTg2p2eiQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Dec 7, 2020 at 5:25 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Mon, Dec 7, 2020 at 4:20 PM Bharath Rupireddy
> <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> >
> > On Mon, Dec 7, 2020 at 4:04 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > What is the need of checking query_level when 'isForCTAS' is set only
> > > when Gather is a top-node?
> > >
> >
> > isForCTAS is getting set before pg_plan_query() which is being used in
> > cost_gather(). We will not have a Gather node by then and hence will
> > not pass queryDesc to IsParallelInsertInCTASAllowed(into, NULL) while
> > setting isForCTAS to true.
> >
>
> IsParallelInsertInCTASAllowed() seems to be returning false if
> queryDesc is NULL, so won't isForCTAS be always set to false? I think
> I am missing something here.
>

My bad. I utterly missed this, sorry for the confusion.

My intention to have IsParallelInsertInCTASAllowed() is for two
purposes. 1. when called before planning without queryDesc, it should
return true if IS_CTAS(into) is true and is not a temporary table. 2.
when called after planning with a non-null queryDesc, along with 1)
checks, it should also perform the Gather State checks and return
accordingly.

I have corrected it in v9 patch. Please have a look.

>
> > > The isForCTAS will be true because [create table as], the
> > > query_level is always 1 because there is no subquery.
> > > So even if gather is not the top node, parallel cost will still be ignored.
> > >
> > > Is that works as expected ?
> > >
> >
> > I don't think that is expected and is not the case without this patch.
> > The cost shouldn't be changed for existing cases where the write is
> > not pushed to workers.
> >
>
> Thanks for pointing that out. Yes it should not change for the cases
> where parallel inserts will not be picked later.
>
> Any better suggestions on how to make the planner consider that the
> CTAS might choose parallel inserts later at the same time avoiding the
> above issue in case it doesn't?
>

I'm not quite sure how to address this. Can we not allow the planner
to consider that the select is for CTAS and check only after the
planning is done for the Gather node and other checks? This is simple
to do, but we might miss some parallel plans for the SELECTs because
the planner would have already considered the tuple transfer cost from
workers to Gather wrongly because of which that parallel plan would
have become costlier compared to non parallel plans. IMO, we can do
this since it also keeps the existing behaviour of the planner i.e.
when the planner is planning for SELECTs it doesn't know that it is
doing it for CTAS. Thoughts?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
v9-0001-Parallel-Inserts-in-CREATE-TABLE-AS.patch application/octet-stream 49.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message kubilay.dag 2020-12-07 13:49:01 AW: posgres 12 bug (partitioned table)
Previous Message Смирнов Денис 2020-12-07 13:23:42 PoC Refactor AM analyse API