Parallel INSERT (INTO ... SELECT ...)

From: Greg Nancarrow <gregn4422(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Parallel INSERT (INTO ... SELECT ...)
Date: 2020-09-22 04:55:21
Message-ID: CAJcOf-cXnB5cnMKqWEp2E2z7Mvcd04iLVmV=qpFJrR3AcrTS3g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Hackers,

Following on from Dilip Kumar's POC patch for allowing parallelism of
the SELECT part of "INSERT INTO ... SELECT ...", I have attached a POC
patch for allowing parallelism of both the INSERT and SELECT parts,
where it can be allowed.
For cases where it can't be allowed (e.g. INSERT into a table with
foreign keys, or INSERT INTO ... SELECT ... ON CONFLICT ... DO UPDATE
...") it at least allows parallelism of the SELECT part.
Obviously I've had to update the planner and executor and
parallel-worker code to make this happen, hopefully not breaking too
many things along the way.

Examples with patch applied:

(1) non-parallel:

test=# explain analyze insert into primary_tbl select * from third_tbl;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Insert on primary_tbl (cost=0.00..154.99 rows=9999 width=12) (actual
time=108.445..108.446 rows=0 loops=1)
-> Seq Scan on third_tbl (cost=0.00..154.99 rows=9999 width=12)
(actual time=0.009..5.282 rows=9999 loops=1)
Planning Time: 0.132 ms
Execution Time: 108.596 ms
(4 rows)

(2) parallel:

test=# explain analyze insert into primary_tbl select * from third_tbl;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
Gather (cost=0.00..16.00 rows=9999 width=12) (actual
time=69.870..70.310 rows=0 loops=1)
Workers Planned: 5
Workers Launched: 5
-> Parallel Insert on primary_tbl (cost=0.00..16.00 rows=500
width=12) (actual time=59.948..59.949 rows=0 loops=6)
-> Parallel Seq Scan on third_tbl (cost=0.00..80.00
rows=2500 width=12) (actual time=0.014..0.922 rows=1666 loops=6)
Planning Time: 0.121 ms
Execution Time: 70.438 ms
(7 rows)

(3) parallel select only (insert into table with foreign key)

test=# explain analyze insert into secondary_tbl select * from third_tbl;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
Insert on secondary_tbl (cost=0.00..80.00 rows=9999 width=12)
(actual time=33.864..33.926 rows=0 loops=1)
-> Gather (cost=0.00..80.00 rows=9999 width=12) (actual
time=0.451..5.201 rows=9999 loops=1)
Workers Planned: 4
Workers Launched: 4
-> Parallel Seq Scan on third_tbl (cost=0.00..80.00
rows=2500 width=12) (actual time=0.013..0.717 rows=2000 loops=5)
Planning Time: 0.127 ms
Trigger for constraint secondary_tbl_index_fkey: time=331.834 calls=9999
Execution Time: 367.342 ms
(8 rows)

Known issues/TODOs:
- Currently only for "INSERT INTO ... SELECT ...". To support "INSERT
INTO ... VALUES ..." would need additional Table AM functions for
dividing up the INSERT work amongst the workers (currently only exists
for scans).
- When INSERTs are made parallel, currently the reported row-count in
the "INSERT 0 <row-count>" status only reflects the rows that the
leader has processed (not the workers) - so it is obviously less than
the actual number of rows inserted.
- Functions relating to computing the number of parallel workers for
an INSERT, and the cost of an INSERT, need work.
- "force_parallel_mode" handling was updated so that it only affects
SELECT (not INSERT) - can't allow it for INSERT because we're only
supporting "INSERT INTO .. SELECT ..." and don't support other types
of INSERTs, and also can't allow attempted parallel UPDATEs resulting
from "INSERT INTO ... SELECT ... ON CONFLICT ... DO UPDATE" etc.

Thoughts and feedback?

Regards,
Greg Nancarrow
Fujitsu Australia

Attachment Content-Type Size
0001-ParallelInsertSelect.patch application/octet-stream 28.5 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hou, Zhijie 2020-09-22 04:59:49 Use appendStringInfoString and appendPQExpBufferStr where possible
Previous Message Thomas Munro 2020-09-22 03:33:54 Re: Parallel Full Hash Join