Re: Parallel INSERT (INTO ... SELECT ...)

From: Greg Nancarrow <gregn4422(at)gmail(dot)com>
To: "Hou, Zhijie" <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com>
Cc: "Tang, Haiying" <tanghy(dot)fnst(at)cn(dot)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Antonin Houska <ah(at)cybertec(dot)at>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)fujitsu(dot)com>
Subject: Re: Parallel INSERT (INTO ... SELECT ...)
Date: 2021-02-16 09:07:51
Message-ID: CAJcOf-dCT3QXGpmrWRM4P0y5Pt5PYGe6jmQ+ZLWteGvfdFLCoA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Feb 8, 2021 at 8:13 PM Hou, Zhijie <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com> wrote:
>
> > > Did it actually use a parallel plan in your testing?
> > > When I ran these tests with the Parallel INSERT patch applied, it did
> > > not naturally choose a parallel plan for any of these cases.
> >
> > Yes, these cases pick parallel plan naturally on my test environment.
> >
> > postgres=# explain verbose insert into testscan select a from x where
> > a<80000 or (a%2=0 and a>199900000);
> > QUERY PLAN
> > ----------------------------------------------------------------------
> > -----------------------------
> > Gather (cost=4346.89..1281204.64 rows=81372 width=0)
> > Workers Planned: 4
> > -> Insert on public.testscan (cost=3346.89..1272067.44 rows=0
> > width=0)
> > -> Parallel Bitmap Heap Scan on public.x1
> > (cost=3346.89..1272067.44 rows=20343 width=8)
> > Output: x1.a, NULL::integer
> > Recheck Cond: ((x1.a < 80000) OR (x1.a > 199900000))
> > Filter: ((x1.a < 80000) OR (((x1.a % 2) = 0) AND (x1.a >
> > 199900000)))
> > -> BitmapOr (cost=3346.89..3346.89 rows=178808
> > width=0)
> > -> Bitmap Index Scan on x1_a_idx
> > (cost=0.00..1495.19 rows=80883 width=0)
> > Index Cond: (x1.a < 80000)
> > -> Bitmap Index Scan on x1_a_idx
> > (cost=0.00..1811.01 rows=97925 width=0)
> > Index Cond: (x1.a > 199900000)
> >
> > PSA is my postgresql.conf file, maybe you can have a look. Besides, I didn't
> > do any parameters tuning in my test session.
>
> I reproduced this on my machine.
>
> I think we'd better do "analyze" before insert which helps reproduce this easier.
> Like:
>
> -----
> analyze;
> explain analyze verbose insert into testscan select a from x where a<80000 or (a%2=0 and a>199900000);
> -----
>

Thanks, I tried test_bimap.sql in my own environment, and added
"analyze", and I also found it naturally chose a parallel INSERT with
parallel bitmap heap scan for each of these cases.
However, I didn't see any performance degradation when compared
against serial INSERT with bitmap heap scan.
The parallel plan in these cases seems to run a bit faster.
(Note that I'm using a release build of Postgres, and using default
postgresql.conf)

test=# set max_parallel_workers_per_gather=4;
SET
test=# explain analyze verbose insert into testscan select a from x
where a<80000 or (a%2=0 and a>199900000);

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------
Gather (cost=4193.29..1255440.94 rows=74267 width=0) (actual
time=210.587..212.135 rows=0 loops=1)
Workers Planned: 4
Workers Launched: 4
-> Insert on public.testscan (cost=3193.29..1247014.24 rows=0
width=0) (actual time=195.296..195.298 rows=0 loops=5)
Worker 0: actual time=189.512..189.514 rows=0 loops=1
Worker 1: actual time=194.843..194.844 rows=0 loops=1
Worker 2: actual time=193.986..193.988 rows=0 loops=1
Worker 3: actual time=188.035..188.037 rows=0 loops=1
-> Parallel Bitmap Heap Scan on public.x
(cost=3193.29..1247014.24 rows=18567 width=8) (actual
time=7.992..25.837 row
s=26000 loops=5)
Output: x.a, NULL::integer
Recheck Cond: ((x.a < 80000) OR (x.a > 199900000))
Filter: ((x.a < 80000) OR (((x.a % 2) = 0) AND (x.a >
199900000)))
Rows Removed by Filter: 10000
Heap Blocks: exact=261
Worker 0: actual time=1.473..14.458 rows=22465 loops=1
Worker 1: actual time=7.370..31.359 rows=30525 loops=1
Worker 2: actual time=8.765..19.838 rows=18549 loops=1
Worker 3: actual time=0.279..17.269 rows=23864 loops=1
-> BitmapOr (cost=3193.29..3193.29 rows=170535
width=0) (actual time=21.775..21.777 rows=0 loops=1)
-> Bitmap Index Scan on x_a_idx
(cost=0.00..1365.94 rows=73783 width=0) (actual time=11.961..11.961
rows=
79999 loops=1)
Index Cond: (x.a < 80000)
-> Bitmap Index Scan on x_a_idx
(cost=0.00..1790.21 rows=96752 width=0) (actual time=9.809..9.809
rows=10
0000 loops=1)
Index Cond: (x.a > 199900000)
Planning Time: 0.276 ms
Execution Time: 212.189 ms
(25 rows)

test=# truncate testscan;
TRUNCATE TABLE
test=# set max_parallel_workers_per_gather=0;
SET
test=# explain analyze verbose insert into testscan select a from x
where a<80000 or (a%2=0 and a>199900000);
QUERY
PLAN

--------------------------------------------------------------------------------------------------------------------------------
Insert on public.testscan (cost=3193.29..3625636.35 rows=0 width=0)
(actual time=241.222..241.224 rows=0 loops=1)
-> Bitmap Heap Scan on public.x (cost=3193.29..3625636.35
rows=74267 width=8) (actual time=16.945..92.392 rows=129999 loops
=1)
Output: x.a, NULL::integer
Recheck Cond: ((x.a < 80000) OR (x.a > 199900000))
Filter: ((x.a < 80000) OR (((x.a % 2) = 0) AND (x.a > 199900000)))
Rows Removed by Filter: 50000
Heap Blocks: exact=975
-> BitmapOr (cost=3193.29..3193.29 rows=170535 width=0)
(actual time=16.735..16.736 rows=0 loops=1)
-> Bitmap Index Scan on x_a_idx (cost=0.00..1365.94
rows=73783 width=0) (actual time=9.222..9.223 rows=79999 lo
ops=1)
Index Cond: (x.a < 80000)
-> Bitmap Index Scan on x_a_idx (cost=0.00..1790.21
rows=96752 width=0) (actual time=7.511..7.511 rows=100000 l
oops=1)
Index Cond: (x.a > 199900000)
Planning Time: 0.205 ms
Execution Time: 241.274 ms
(14 rows)

============

test=# set max_parallel_workers_per_gather=4;
SET
test=# explain analyze verbose insert into testscan_pk select a from x
where a<80000 or (a%2=0 and a>199900000);

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------
Gather (cost=4193.29..1255440.94 rows=74267 width=0) (actual
time=572.242..573.683 rows=0 loops=1)
Workers Planned: 4
Workers Launched: 4
-> Insert on public.testscan_pk (cost=3193.29..1247014.24 rows=0
width=0) (actual time=566.303..566.308 rows=0 loops=5)
Worker 0: actual time=566.756..566.757 rows=0 loops=1
Worker 1: actual time=564.778..564.779 rows=0 loops=1
Worker 2: actual time=564.402..564.419 rows=0 loops=1
Worker 3: actual time=563.748..563.749 rows=0 loops=1
-> Parallel Bitmap Heap Scan on public.x
(cost=3193.29..1247014.24 rows=18567 width=8) (actual
time=16.479..37.327 ro
ws=26000 loops=5)
Output: x.a, NULL::integer
Recheck Cond: ((x.a < 80000) OR (x.a > 199900000))
Filter: ((x.a < 80000) OR (((x.a % 2) = 0) AND (x.a >
199900000)))
Rows Removed by Filter: 10000
Heap Blocks: exact=204
Worker 0: actual time=17.358..36.895 rows=24233 loops=1
Worker 1: actual time=12.711..33.538 rows=25616 loops=1
Worker 2: actual time=15.671..35.701 rows=24831 loops=1
Worker 3: actual time=17.656..39.310 rows=26645 loops=1
-> BitmapOr (cost=3193.29..3193.29 rows=170535
width=0) (actual time=18.541..18.542 rows=0 loops=1)
-> Bitmap Index Scan on x_a_idx
(cost=0.00..1365.94 rows=73783 width=0) (actual time=8.549..8.549
rows=79
999 loops=1)
Index Cond: (x.a < 80000)
-> Bitmap Index Scan on x_a_idx
(cost=0.00..1790.21 rows=96752 width=0) (actual time=9.990..9.990
rows=10
0000 loops=1)
Index Cond: (x.a > 199900000)
Planning Time: 0.240 ms
Execution Time: 573.733 ms
(25 rows)

test=# set max_parallel_workers_per_gather=0;
SET
test=# truncate testscan_pk;
TRUNCATE TABLE
test=# explain analyze verbose insert into testscan_pk select a from x
where a<80000 or (a%2=0 and a>199900000);
QUERY
PLAN

--------------------------------------------------------------------------------------------------------------------------------
Insert on public.testscan_pk (cost=3193.29..3625636.35 rows=0
width=0) (actual time=598.997..598.998 rows=0 loops=1)
-> Bitmap Heap Scan on public.x (cost=3193.29..3625636.35
rows=74267 width=8) (actual time=20.153..96.858 rows=129999 loops
=1)
Output: x.a, NULL::integer
Recheck Cond: ((x.a < 80000) OR (x.a > 199900000))
Filter: ((x.a < 80000) OR (((x.a % 2) = 0) AND (x.a > 199900000)))
Rows Removed by Filter: 50000
Heap Blocks: exact=975
-> BitmapOr (cost=3193.29..3193.29 rows=170535 width=0)
(actual time=19.840..19.841 rows=0 loops=1)
-> Bitmap Index Scan on x_a_idx (cost=0.00..1365.94
rows=73783 width=0) (actual time=9.276..9.276 rows=79999 lo
ops=1)
Index Cond: (x.a < 80000)
-> Bitmap Index Scan on x_a_idx (cost=0.00..1790.21
rows=96752 width=0) (actual time=10.562..10.562 rows=100000
loops=1)
Index Cond: (x.a > 199900000)
Planning Time: 0.204 ms
Execution Time: 599.098 ms
(14 rows)

============

test=# set max_parallel_workers_per_gather=4;
SET
test=# explain analyze verbose insert into testscan_index select a
from x where a<80000 or (a%2=0 and a>199900000);

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------
Gather (cost=4193.29..1255440.94 rows=74267 width=0) (actual
time=560.460..562.386 rows=0 loops=1)
Workers Planned: 4
Workers Launched: 4
-> Insert on public.testscan_index (cost=3193.29..1247014.24
rows=0 width=0) (actual time=553.434..553.435 rows=0 loops=5)
Worker 0: actual time=548.751..548.752 rows=0 loops=1
Worker 1: actual time=552.008..552.009 rows=0 loops=1
Worker 2: actual time=553.094..553.095 rows=0 loops=1
Worker 3: actual time=553.389..553.390 rows=0 loops=1
-> Parallel Bitmap Heap Scan on public.x
(cost=3193.29..1247014.24 rows=18567 width=8) (actual
time=13.759..34.487 ro
ws=26000 loops=5)
Output: x.a, NULL::integer
Recheck Cond: ((x.a < 80000) OR (x.a > 199900000))
Filter: ((x.a < 80000) OR (((x.a % 2) = 0) AND (x.a >
199900000)))
Rows Removed by Filter: 10000
Heap Blocks: exact=183
Worker 0: actual time=8.698..29.924 rows=26173 loops=1
Worker 1: actual time=12.865..33.889 rows=27421 loops=1
Worker 2: actual time=13.088..32.823 rows=24591 loops=1
Worker 3: actual time=14.075..36.349 rows=26571 loops=1
-> BitmapOr (cost=3193.29..3193.29 rows=170535
width=0) (actual time=19.356..19.357 rows=0 loops=1)
-> Bitmap Index Scan on x_a_idx
(cost=0.00..1365.94 rows=73783 width=0) (actual time=10.330..10.330
rows=
79999 loops=1)
Index Cond: (x.a < 80000)
-> Bitmap Index Scan on x_a_idx
(cost=0.00..1790.21 rows=96752 width=0) (actual time=9.024..9.024
rows=10
0000 loops=1)
Index Cond: (x.a > 199900000)
Planning Time: 0.219 ms
Execution Time: 562.442 ms
(25 rows)

test=# set max_parallel_workers_per_gather=0;
SET
test=# truncate testscan_index;
TRUNCATE TABLE
test=# explain analyze verbose insert into testscan_index select a
from x where a<80000 or (a%2=0 and a>199900000);
QUERY
PLAN

--------------------------------------------------------------------------------------------------------------------------------
Insert on public.testscan_index (cost=3193.29..3625636.35 rows=0
width=0) (actual time=607.619..607.621 rows=0 loops=1)
-> Bitmap Heap Scan on public.x (cost=3193.29..3625636.35
rows=74267 width=8) (actual time=21.001..96.283 rows=129999 loops
=1)
Output: x.a, NULL::integer
Recheck Cond: ((x.a < 80000) OR (x.a > 199900000))
Filter: ((x.a < 80000) OR (((x.a % 2) = 0) AND (x.a > 199900000)))
Rows Removed by Filter: 50000
Heap Blocks: exact=975
-> BitmapOr (cost=3193.29..3193.29 rows=170535 width=0)
(actual time=20.690..20.691 rows=0 loops=1)
-> Bitmap Index Scan on x_a_idx (cost=0.00..1365.94
rows=73783 width=0) (actual time=9.097..9.097 rows=79999 lo
ops=1)
Index Cond: (x.a < 80000)
-> Bitmap Index Scan on x_a_idx (cost=0.00..1790.21
rows=96752 width=0) (actual time=11.591..11.591 rows=100000
loops=1)
Index Cond: (x.a > 199900000)
Planning Time: 0.205 ms
Execution Time: 607.734 ms
(14 rows)

Even when I changed the queries to return more rows from the scan, to
the point where it chose not to use a parallel INSERT bitmap heap scan
(in favour of parallel seq scan), and then forced it to by disabling
seqscan, I found that it was still at least as fast as serial INSERT
with bitmap heap scan.

Regards,
Greg Nancarrow
Fujitsu Australia

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Takashi Menjo 2021-02-16 09:10:11 Re: [PoC] Non-volatile WAL buffer
Previous Message Andrey V. Lepikhov 2021-02-16 08:23:34 Re: [POC] Fast COPY FROM command for the table with foreign partitions