Re: Parallel INSERT (INTO ... SELECT ...)

From: Amit Langote <amitlangote09(at)gmail(dot)com>
To: "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>
Cc: Greg Nancarrow <gregn4422(at)gmail(dot)com>, "houzj(dot)fnst(at)cn(dot)fujitsu(dot)com" <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, vignesh C <vignesh21(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "tanghy(dot)fnst(at)cn(dot)fujitsu(dot)com" <tanghy(dot)fnst(at)cn(dot)fujitsu(dot)com>
Subject: Re: Parallel INSERT (INTO ... SELECT ...)
Date: 2021-02-10 10:03:42
Message-ID: CA+HiwqGbOFjhf_Rv7pid2kWCCfXuZpnmbB1zp5Pd2VmXYzCMNg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Feb 10, 2021 at 5:52 PM tsunakawa(dot)takay(at)fujitsu(dot)com
<tsunakawa(dot)takay(at)fujitsu(dot)com> wrote:
> From: Amit Langote <amitlangote09(at)gmail(dot)com>
> > Just to be clear, I'm not suggesting that we should put effort into
> > making INSERT ... VALUES run in parallel. I'm just raising my concern
> > about embedding the assumption in max_parallel_hazard() that it will
> > never make sense to do so.
>
> I'm sorry I misunderstood your suggestion. So, you're suggesting that it may be better to place the VALUES existence check outside max_parallel_hazard(). (I may be a bit worried if I may misunderstanding in a different way.)

To add context to my comments, here's the block of code in the patch I
was commenting on:

+ /*
+ * If there is no underlying SELECT, a parallel table-modification
+ * operation is not possible (nor desirable).
+ */
+ hasSubQuery = false;
+ foreach(lc, parse->rtable)
+ {
+ rte = lfirst_node(RangeTblEntry, lc);
+ if (rte->rtekind == RTE_SUBQUERY)
+ {
+ hasSubQuery = true;
+ break;
+ }
+ }
+ if (!hasSubQuery)
+ return PROPARALLEL_UNSAFE;

For a modification query, this makes max_parallel_hazard() return that
it is unsafe to parallelize the query because it doesn't contain a
subquery RTE, or only contains a VALUES RTE.

I was trying to say that inside max_parallel_hazard() seems to be a
wrong place to reject parallelism for modification if only because
there are no subquery RTEs in the query. Although now I'm thinking
that maybe it's okay as long as it's appropriately placed. I shared
one suggestion in my reply to Amit K's email.

> The description of max_parallel_hazard() gave me an impression that this is the right place to check VALUES, because its role can be paraphrased in simpler words like "Tell you if the given Query is safe for parallel execution."
>
> In that regard, the standard_planner()'s if conditions that check Query->commandType and Query->hasModifyingCTE can be moved into max_parallel_hazard() too.
>
> But looking closer to the description, it says "Returns the worst function hazard." Function hazard? Should this function only check functions? Or do we want to modify this description and get max_parallel_hazard() to provide more service?

Yeah, updating the description to be more general may make sense.

--
Amit Langote
EDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2021-02-10 10:10:37 Re: repeated decoding of prepared transactions
Previous Message Hou, Zhijie 2021-02-10 09:59:13 RE: Parallel INSERT (INTO ... SELECT ...)