Re: Parallel Inserts in CREATE TABLE AS

From: Zhihong Yu <zyu(at)yugabyte(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, "Hou, Zhijie" <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Inserts in CREATE TABLE AS
Date: 2020-12-10 02:19:05
Message-ID: CALNJ-vTMt4AhR61TrQ9VYUeMQHRLoPR=KFMZRehT0Lv+-agigg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

+ if (!OidIsValid(col->collOid) &&
+ type_is_collatable(col->typeName->typeOid))
+ ereport(ERROR,
...
+ attrList = lappend(attrList, col);

Should attrList be freed when ereport is called ?

+ query->CTASParallelInsInfo &= CTAS_PARALLEL_INS_UNDEF;

Since CTAS_PARALLEL_INS_UNDEF is 0, isn't the above equivalent to assigning
the value of 0 ?

Cheers

On Wed, Dec 9, 2020 at 5:43 PM Bharath Rupireddy <
bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:

> On Wed, Dec 9, 2020 at 10:16 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> >
> > On Tue, Dec 8, 2020 at 6:24 PM Hou, Zhijie <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com>
> wrote:
> > >
> > > > > I'm not quite sure how to address this. Can we not allow the
> planner
> > > > > to consider that the select is for CTAS and check only after the
> > > > > planning is done for the Gather node and other checks?
> > > > >
> > > >
> > > > IIUC, you are saying that we should not influence the cost of gather
> node
> > > > even when the insertion would be done by workers? I think that
> should be
> > > > our fallback option anyway but that might miss some paths to be
> considered
> > > > parallel where the cost becomes more due to parallel_tuple_cost (aka
> tuple
> > > > transfer cost). I think the idea is we can avoid the tuple transfer
> cost
> > > > only when Gather is the top node because only at that time we can
> push
> > > > insertion down, right? How about if we have some way to detect the
> same
> > > > before calling generate_useful_gather_paths()? I think when we are
> calling
> > > > apply_scanjoin_target_to_paths() in grouping_planner(), if the
> > > > query_level is 1, it is for CTAS, and it doesn't have a chance to
> create
> > > > UPPER_REL (doesn't have grouping, order, limit, etc clause) then we
> can
> > > > probably assume that the Gather will be top_node. I am not sure
> about this
> > > > but I think it is worth exploring.
> > > >
> > >
> > > I took a look at the parallel insert patch and have the same idea.
> > > https://commitfest.postgresql.org/31/2844/
> > >
> > > * Consider generating Gather or Gather Merge paths. We must
> only do this
> > > * if the relation is parallel safe, and we don't do it for
> child rels to
> > > * avoid creating multiple Gather nodes within the same plan.
> We must do
> > > * this after all paths have been generated and before
> set_cheapest, since
> > > * one of the generated paths may turn out to be the cheapest
> one.
> > > */
> > > if (rel->consider_parallel && !IS_OTHER_REL(rel))
> > > generate_useful_gather_paths(root, rel, false);
> > >
> > > IMO Gatherpath created here seems the right one which can possible
> ignore parallel cost if in CTAS.
> > > But We need check the following parse option which will create path to
> be the parent of Gatherpath here.
> > >
> > > if (root->parse->rowMarks)
> > > if (limit_needed(root->parse))
> > > if (root->parse->sortClause)
> > > if (root->parse->distinctClause)
> > > if (root->parse->hasWindowFuncs)
> > > if (root->parse->groupClause || root->parse->groupingSets ||
> root->parse->hasAggs || root->root->hasHavingQual)
> > >
> >
> > Yeah, and as I pointed earlier, along with this we also need to
> > consider that the RelOptInfo must be the final target(top level rel).
> >
>
> Attaching v10 patch set that includes the change suggested above for
> ignoring parallel tuple cost and also few more test cases. I split the
> patch as per Amit's suggestion. v10-0001 contains parallel inserts
> code without planner tuple cost changes and test cases. v10-0002 has
> required changes for ignoring planner tuple cost calculations.
>
> Please review it further.
>
> After the review and addressing all the comments, I plan to make some
> code common so that it can be used for Parallel Inserts in REFRESH
> MATERIALIZED VIEW. Thoughts?
>
> With Regards,
> Bharath Rupireddy.
> EnterpriseDB: http://www.enterprisedb.com
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message tsunakawa.takay@fujitsu.com 2020-12-10 02:23:54 RE: Parallel INSERT (INTO ... SELECT ...)
Previous Message Patrick Handja 2020-12-10 01:54:13 Re: Setof RangeType returns