Re: INSERT INTO SELECT, Why Parallelism is not selected?

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: INSERT INTO SELECT, Why Parallelism is not selected?
Date: 2020-07-17 05:54:54
Message-ID: CAA4eK1+J28VHR2bGBhQDgR4Qdf59yCy-d67MgazBwZGDD_MDUg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jul 16, 2020 at 6:43 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Wed, Jul 15, 2020 at 11:14 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > The attached patch fixes the comments. Let me know if you think I
> > have missed anything or any other comments.
>
> If it's safe now, why not remove the error check?
>

I think it is not safe for all kind of Inserts (see my response later
in email), so we need some smarts to identify un-safe inserts before
we can open this check.

> (Is it safe? Could there be other problems?)
>

I think we need to be careful of two things: (a) Do we want to enable
parallel inserts where tuple locks are involved, forex. in statements
like "Insert into primary_tbl Select * from secondary_tbl Where col <
10 For Update;"? In such statements, I don't see any problem because
each worker will operate on a separate page and even if the leader
already has a lock on the tuple, it will be granted to the worker as
it is taken in the same transaction. (b) The insert statements that
can generate 'CommandIds' which can happen while insert into tables
with foreign keys, see below test:

CREATE TABLE primary_tbl(index INTEGER PRIMARY KEY, height real, weight real);
insert into primary_tbl values(1, 1.1, 100);
insert into primary_tbl values(2, 1.2, 100);
insert into primary_tbl values(3, 1.3, 100);

CREATE TABLE secondary_tbl(index INTEGER REFERENCES
primary_tbl(index), height real, weight real);

insert into secondary_tbl values(generate_series(1,3),1.2,200);

Here we can't parallelise statements like "insert into secondary_tbl
values(generate_series(1,3),1.2,200);" as they will generate
'CommandIds' for each row insert into table with foreign key. The new
command id is generated while performing a foreign key check. Now, it
is a separate question whether generating a command id for each row
insert is required or not but as of now we can't parallelize such
statements.

Do you have something else in mind?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2020-07-17 05:57:03 Re: Have SIGHUP instead of SIGTERM for config reload in logical replication launcher
Previous Message Michael Paquier 2020-07-17 05:45:10 Re: Added tab completion for the missing options in copy statement