Re: Parallel Inserts in CREATE TABLE AS

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: "Hou, Zhijie" <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Zhihong Yu <zyu(at)yugabyte(dot)com>
Subject: Re: Parallel Inserts in CREATE TABLE AS
Date: 2020-12-09 04:46:41
Message-ID: CAFiTN-sgAYq8bZNL8r7_dU2QWFOKASw0h=r4cKTZ7zmK+uQySw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 8, 2020 at 6:24 PM Hou, Zhijie <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com> wrote:
>
> > > I'm not quite sure how to address this. Can we not allow the planner
> > > to consider that the select is for CTAS and check only after the
> > > planning is done for the Gather node and other checks?
> > >
> >
> > IIUC, you are saying that we should not influence the cost of gather node
> > even when the insertion would be done by workers? I think that should be
> > our fallback option anyway but that might miss some paths to be considered
> > parallel where the cost becomes more due to parallel_tuple_cost (aka tuple
> > transfer cost). I think the idea is we can avoid the tuple transfer cost
> > only when Gather is the top node because only at that time we can push
> > insertion down, right? How about if we have some way to detect the same
> > before calling generate_useful_gather_paths()? I think when we are calling
> > apply_scanjoin_target_to_paths() in grouping_planner(), if the
> > query_level is 1, it is for CTAS, and it doesn't have a chance to create
> > UPPER_REL (doesn't have grouping, order, limit, etc clause) then we can
> > probably assume that the Gather will be top_node. I am not sure about this
> > but I think it is worth exploring.
> >
>
> I took a look at the parallel insert patch and have the same idea.
> https://commitfest.postgresql.org/31/2844/
>
> * Consider generating Gather or Gather Merge paths. We must only do this
> * if the relation is parallel safe, and we don't do it for child rels to
> * avoid creating multiple Gather nodes within the same plan. We must do
> * this after all paths have been generated and before set_cheapest, since
> * one of the generated paths may turn out to be the cheapest one.
> */
> if (rel->consider_parallel && !IS_OTHER_REL(rel))
> generate_useful_gather_paths(root, rel, false);
>
> IMO Gatherpath created here seems the right one which can possible ignore parallel cost if in CTAS.
> But We need check the following parse option which will create path to be the parent of Gatherpath here.
>
> if (root->parse->rowMarks)
> if (limit_needed(root->parse))
> if (root->parse->sortClause)
> if (root->parse->distinctClause)
> if (root->parse->hasWindowFuncs)
> if (root->parse->groupClause || root->parse->groupingSets || root->parse->hasAggs || root->root->hasHavingQual)
>

Yeah, and as I pointed earlier, along with this we also need to
consider that the RelOptInfo must be the final target(top level rel).

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2020-12-09 05:47:25 Re: Refactor MD5 implementations and switch to EVP for OpenSSL
Previous Message Greg Nancarrow 2020-12-09 04:40:52 Re: Parallel INSERT (INTO ... SELECT ...)