Re: Parallel INSERT (INTO ... SELECT ...)

From: Amit Langote <amitlangote09(at)gmail(dot)com>
To: Greg Nancarrow <gregn4422(at)gmail(dot)com>
Cc: "Hou, Zhijie" <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, vignesh C <vignesh21(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)fujitsu(dot)com>, "Tang, Haiying" <tanghy(dot)fnst(at)cn(dot)fujitsu(dot)com>
Subject: Re: Parallel INSERT (INTO ... SELECT ...)
Date: 2021-02-10 03:38:53
Message-ID: CA+HiwqFU13k43Q_gPJHocmjc8wsmQBgyqjo6-cVjKx4gUH+V3A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Feb 9, 2021 at 10:30 AM Greg Nancarrow <gregn4422(at)gmail(dot)com> wrote:
> On Tue, Feb 9, 2021 at 1:04 AM Amit Langote <amitlangote09(at)gmail(dot)com> wrote:
> > That brings me to to this part of the hunk:
> >
> > + /*
> > + * If there is no underlying SELECT, a parallel table-modification
> > + * operation is not possible (nor desirable).
> > + */
> > + hasSubQuery = false;
> > + foreach(lc, parse->rtable)
> > + {
> > + rte = lfirst_node(RangeTblEntry, lc);
> > + if (rte->rtekind == RTE_SUBQUERY)
> > + {
> > + hasSubQuery = true;
> > + break;
> > + }
> > + }
> > + if (!hasSubQuery)
> > + return PROPARALLEL_UNSAFE;
> >
> > The justification for this given in:
> >
> > https://www.postgresql.org/message-id/CAJcOf-dF9ohqub_D805k57Y_AuDLeAQfvtaax9SpwjTSEVdiXg%40mail.gmail.com
> >
> > seems to be that the failure of a test case in
> > partition-concurrent-attach isolation suite is prevented if finding no
> > subquery RTEs in the query is flagged as parallel unsafe, which in
> > turn stops max_parallel_hazard_modify() from locking partitions for
> > safety checks in such cases. But it feels unprincipled to have this
> > code to work around a specific test case that's failing. I'd rather
> > edit the failing test case to disable parallel execution as
> > Tsunakawa-san suggested.
> >
>
> The code was not changed because of the test case (though it was
> fortunate that the test case worked after the change).

Ah, I misread then, sorry.

> The code check that you have identified above ensures that the INSERT
> has an underlying SELECT, because the planner won't (and shouldn't
> anyway) generate a parallel plan for INSERT...VALUES, so there is no
> point doing any parallel-safety checks in this case.
> It just so happens that the problem test case uses INSERT...VALUES -
> and it shouldn't have triggered the parallel-safety checks for
> parallel INSERT for this case anyway, because INSERT...VALUES can't
> (and shouldn't) be parallelized.

AFAICS, max_parallel_hazard() path never bails from doing further
safety checks based on anything other than finding a query component
whose hazard level crosses context->max_interesting. You're trying to
add something that bails based on second-guessing that a parallel path
would not be chosen, which I find somewhat objectionable.

If the main goal of bailing out is to avoid doing the potentially
expensive modification safety check on the target relation, maybe we
should try to somehow make the check less expensive. I remember
reading somewhere in the thread about caching the result of this check
in relcache, but haven't closely studied the feasibility of doing so.

--
Amit Langote
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Nancarrow 2021-02-10 03:44:12 Re: Parallel INSERT (INTO ... SELECT ...)
Previous Message Andy Fan 2021-02-10 03:27:25 Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)