RE: Parallel INSERT (INTO ... SELECT ...)

From: "Hou, Zhijie" <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)fujitsu(dot)com>, "Tang, Haiying" <tanghy(dot)fnst(at)cn(dot)fujitsu(dot)com>, Antonin Houska <ah(at)cybertec(dot)at>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Amit Langote <amitlangote09(at)gmail(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>
Subject: RE: Parallel INSERT (INTO ... SELECT ...)
Date: 2021-02-10 10:22:23
Message-ID: d4484e3257eb45a1bf23d22d1683b008@G08CNEXMBPEKD05.g08.fujitsu.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> What are the results if disable the bitmap heap scan(Set enable_bitmapscan
> = off)? If that happens to be true, then we might also want to consider
> if in some way we can teach parallel insert to cost more in such cases.
> Another thing we can try is to integrate a parallel-insert patch with the
> patch on another thread [1] and see if that makes any difference but not
> sure if we want to go there at this stage unless it is simple to try that
> out?

If we diable bitmapscan, the performance degradation seems will not happen.

[Parallel]
postgres=# explain (ANALYZE, BUFFERS, VERBOSE, WAL) insert into testscan select a from x where a<80000 or (a%2=0 and a>199900000);
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..2090216.68 rows=81338 width=0) (actual time=0.226..5488.455 rows=0 loops=1)
Workers Planned: 4
Workers Launched: 4
Buffers: shared hit=393364 read=1079535 dirtied=984 written=1027
WAL: records=260400 bytes=16549513
-> Insert on public.testscan (cost=0.00..2081082.88 rows=0 width=0) (actual time=5483.113..5483.114 rows=0 loops=4)
Buffers: shared hit=393364 read=1079535 dirtied=984 written=1027
WAL: records=260400 bytes=16549513
Worker 0: actual time=5483.116..5483.117 rows=0 loops=1
Buffers: shared hit=36306 read=264288 dirtied=100 written=49
WAL: records=23895 bytes=1575860
Worker 1: actual time=5483.220..5483.222 rows=0 loops=1
Buffers: shared hit=39750 read=280476 dirtied=101 written=106
WAL: records=26141 bytes=1685083
Worker 2: actual time=5482.844..5482.845 rows=0 loops=1
Buffers: shared hit=38660 read=263713 dirtied=105 written=250
WAL: records=25318 bytes=1657396
Worker 3: actual time=5483.272..5483.274 rows=0 loops=1
Buffers: shared hit=278648 read=271058 dirtied=678 written=622
WAL: records=185046 bytes=11631174
-> Parallel Seq Scan on public.x (cost=0.00..2081082.88 rows=20334 width=8) (actual time=4001.641..5287.248 rows=32500 loops=4)
Output: x.a, NULL::integer
Filter: ((x.a < 80000) OR (((x.a % 2) = 0) AND (x.a > 199900000)))
Rows Removed by Filter: 49967500
Buffers: shared hit=1551 read=1079531
Worker 0: actual time=5335.456..5340.757 rows=11924 loops=1
Buffers: shared hit=281 read=264288
Worker 1: actual time=5335.559..5341.766 rows=13049 loops=1
Buffers: shared hit=281 read=280476
Worker 2: actual time=5335.534..5340.964 rows=12636 loops=1
Buffers: shared hit=278 read=263712
Worker 3: actual time=0.015..5125.503 rows=92390 loops=1
Buffers: shared hit=711 read=271055
Planning:
Buffers: shared hit=19
Planning Time: 0.175 ms
Execution Time: 5488.493 ms

[Serial]
postgres=# explain (ANALYZE, BUFFERS, VERBOSE, WAL) insert into testscan select a from x where a<80000 or (a%2=0 and a>199900000);
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Insert on public.testscan (cost=0.00..5081085.52 rows=0 width=0) (actual time=19311.642..19311.643 rows=0 loops=1)
Buffers: shared hit=392485 read=1079694 dirtied=934 written=933
WAL: records=260354 bytes=16259841
-> Seq Scan on public.x (cost=0.00..5081085.52 rows=81338 width=8) (actual time=0.010..18997.317 rows=129999 loops=1)
Output: x.a, NULL::integer
Filter: ((x.a < 80000) OR (((x.a % 2) = 0) AND (x.a > 199900000)))
Rows Removed by Filter: 199870001
Buffers: shared hit=1391 read=1079691
Planning:
Buffers: shared hit=10
Planning Time: 0.125 ms
Execution Time: 19311.700 ms

Best regards,
houzj

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2021-02-10 10:29:10 Re: [HACKERS] logical decoding of two-phase transactions
Previous Message Masahiko Sawada 2021-02-10 10:19:25 Re: 64-bit XIDs in deleted nbtree pages