Re: Parallel INSERT (INTO ... SELECT ...)

From: Amit Langote <amitlangote09(at)gmail(dot)com>
To: "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>
Cc: Greg Nancarrow <gregn4422(at)gmail(dot)com>, "houzj(dot)fnst(at)cn(dot)fujitsu(dot)com" <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, vignesh C <vignesh21(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "tanghy(dot)fnst(at)cn(dot)fujitsu(dot)com" <tanghy(dot)fnst(at)cn(dot)fujitsu(dot)com>
Subject: Re: Parallel INSERT (INTO ... SELECT ...)
Date: 2021-02-10 08:22:37
Message-ID: CA+HiwqG01zK2GJBQHKOJ0CFxGXAM7EowmZwSycORtXQisf2PEg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Feb 10, 2021 at 5:03 PM tsunakawa(dot)takay(at)fujitsu(dot)com
<tsunakawa(dot)takay(at)fujitsu(dot)com> wrote:
> From: Amit Langote <amitlangote09(at)gmail(dot)com>
> > On Wed, Feb 10, 2021 at 1:35 PM Greg Nancarrow <gregn4422(at)gmail(dot)com>
> > wrote:
> > > There's no "second-guessing" involved here.
> > > There is no underlying way of dividing up the VALUES data of
> > > "INSERT...VALUES" amongst the parallel workers, even if the planner
> > > was updated to produce a parallel-plan for the "INSERT...VALUES" case
> > > (apart from the fact that spawning off parallel workers to insert that
> > > data would almost always result in worse performance than a
> > > non-parallel plan...)
> > > The division of work for parallel workers is part of the table AM
> > > (scan) implementation, which is not invoked for "INSERT...VALUES".
> >
> > I don't disagree that the planner would not normally assign a parallel
> > path simply to pull values out of a VALUES list mentioned in the
> > INSERT command, but deciding something based on the certainty of it in
> > an earlier planning phase seems odd to me. Maybe that's just me
> > though.
>
> In terms of competitiveness, Oracle does not run INSERT VALUES in parallel:
>
> https://docs.oracle.com/en/database/oracle/oracle-database/21/vldbg/types-parallelism.html#GUID-6626C70C-876C-47A4-8C01-9B66574062D8
>
> "The INSERT VALUES statement is never executed in parallel."
>
>
> And SQL Server doesn't either:
>
> https://docs.microsoft.com/en-us/sql/relational-databases/query-processing-architecture-guide?view=sql-server-ver15
>
> "Starting with SQL Server 2016 (13.x) and database compatibility level 130, the INSERT … SELECT statement can be executed in parallel when inserting into heaps or clustered columnstore indexes (CCI), and using the TABLOCK hint."

Just to be clear, I'm not suggesting that we should put effort into
making INSERT ... VALUES run in parallel. I'm just raising my concern
about embedding the assumption in max_parallel_hazard() that it will
never make sense to do so.

Although, maybe there are other more pressing concerns to resolve, so
I will not insist too much on doing anything about this.

--
Amit Langote
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2021-02-10 08:23:50 Re: Parallel INSERT (INTO ... SELECT ...)
Previous Message Markus Wanner 2021-02-10 08:10:19 Re: repeated decoding of prepared transactions