Re: Parallel INSERT (INTO ... SELECT ...)

From: Greg Nancarrow <gregn4422(at)gmail(dot)com>
To: Amit Langote <amitlangote09(at)gmail(dot)com>
Cc: "Hou, Zhijie" <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, vignesh C <vignesh21(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)fujitsu(dot)com>, "Tang, Haiying" <tanghy(dot)fnst(at)cn(dot)fujitsu(dot)com>
Subject: Re: Parallel INSERT (INTO ... SELECT ...)
Date: 2021-02-10 04:35:32
Message-ID: CAJcOf-ftcR44pQPy-DdeJGnAT5d54LO_c2Nf=Z=wLZnpzdp-4A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Feb 10, 2021 at 2:39 PM Amit Langote <amitlangote09(at)gmail(dot)com> wrote:
>
> > The code check that you have identified above ensures that the INSERT
> > has an underlying SELECT, because the planner won't (and shouldn't
> > anyway) generate a parallel plan for INSERT...VALUES, so there is no
> > point doing any parallel-safety checks in this case.
> > It just so happens that the problem test case uses INSERT...VALUES -
> > and it shouldn't have triggered the parallel-safety checks for
> > parallel INSERT for this case anyway, because INSERT...VALUES can't
> > (and shouldn't) be parallelized.
>
> AFAICS, max_parallel_hazard() path never bails from doing further
> safety checks based on anything other than finding a query component
> whose hazard level crosses context->max_interesting.

It's parallel UNSAFE because it's not safe or even possible to have a
parallel plan for this.
(as UNSAFE is the max hazard level, no point in referencing
context->max_interesting).
And there are existing cases of bailing out and not doing further
safety checks (even your v15_delta.diff patch placed code - for
bailing out on "ON CONFLICT ... DO UPDATE" - underneath one such
existing case in max_parallel_hazard_walker()):

else if (IsA(node, Query))
{
Query *query = (Query *) node;

/* SELECT FOR UPDATE/SHARE must be treated as unsafe */
if (query->rowMarks != NULL)
{
context->max_hazard = PROPARALLEL_UNSAFE;
return true;
}

>You're trying to
> add something that bails based on second-guessing that a parallel path
> would not be chosen, which I find somewhat objectionable.
>
> If the main goal of bailing out is to avoid doing the potentially
> expensive modification safety check on the target relation, maybe we
> should try to somehow make the check less expensive. I remember
> reading somewhere in the thread about caching the result of this check
> in relcache, but haven't closely studied the feasibility of doing so.
>

There's no "second-guessing" involved here.
There is no underlying way of dividing up the VALUES data of
"INSERT...VALUES" amongst the parallel workers, even if the planner
was updated to produce a parallel-plan for the "INSERT...VALUES" case
(apart from the fact that spawning off parallel workers to insert that
data would almost always result in worse performance than a
non-parallel plan...)
The division of work for parallel workers is part of the table AM
(scan) implementation, which is not invoked for "INSERT...VALUES".

Regards,
Greg Nancarrow
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2021-02-10 04:43:48 Re: repeated decoding of prepared transactions
Previous Message Michael Paquier 2021-02-10 04:35:16 Re: Preserve attstattarget on REINDEX CONCURRENTLY