Parallel bitmap heap scan

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Parallel bitmap heap scan
Date: 2016-10-07 06:16:40
Message-ID: CAFiTN-ukcfSsZ9GrBcEJG0O0Kbv2WXgOofJUHRQEm1zuH12xEg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

Hi Hackers,

I would like to propose parallel bitmap heap scan feature. After
running TPCH benchmark, It was observed that many of TPCH queries are
using bitmap scan (@TPCH_plan.tar.gz attached below). Keeping this
point in mind we thought that many query will get benefited with
parallel bitmap scan.

Robert has also pointed out the same thing in his blog related to parallel query
http://rhaas.blogspot.in/2016/04/postgresql-96-with-parallel-query-vs.html

Currently Bitmap heap plan look like this :
------------------------------------------------------
Bitmap Heap Scan
-> Bitmap Index Scan

After this patch :
---------------------
Parallel Bitmap Heap Scan
-> Bitmap Index Scan

As part of this work I have implemented parallel processing in
BitmapHeapScan node. BitmapIndexScan is still non parallel.

Brief design idea:
-----------------------
#1. Shared TIDBitmap creation and initialization
First worker to see the state as parallel bitmap info as PBM_INITIAL
become leader and set the state to PBM_INPROGRESS All other workers
see the state as PBM_INPROGRESS will wait for leader to complete the
TIDBitmap.

#2 At this level TIDBitmap is ready and all workers are awake.

#3. Bitmap processing (Iterate and process the pages).
In this phase each worker will iterate over page and chunk array and
select heap pages one by one. If prefetch is enable then there will be
two iterator. Since multiple worker are iterating over same page and
chunk array we need to have a shared iterator, so we grab a spin lock
and iterate within a lock, so that each worker get and different page
to process.

Note: For more detail on design, please refer comment of
BitmapHeapNext API in "parallel-bitmap-heap-scan-v1.patch" file.

Attached patch details:
------------------------------
1. parallel-bitmap-heap-scan-v1.patch: This is the main patch to make
bitmap heap scan node parallel aware.

2. dht-return-dsa-v1.patch: This patch will provide new API, where we
can scan full DHT[1], and get the dsa_pointers (a relative pointer).
The dsa_pointer values can be shared with other processes. We need
this because, after TIDBitmap is created, only one worker will process
whole TIDBitmap and convert it to a page and chunk array. So we need
to store the generic pointer, so that later on each worker can convert
those to their local pointer before start processing.

My patch depends on following patches.
------------------------------------------------------
1. conditional_variable
https://www.postgresql.org/message-id/CAEepm%3D0zshYwB6wDeJCkrRJeoBM%3DjPYBe%2B-k_VtKRU_8zMLEfA%40mail.gmail.com

2. dsa_area
https://www.postgresql.org/message-id/CAEepm%3D024p-MeAsDmG%3DR3%2BtR4EGhuGJs_%2BrjFKF0eRoSTmMJnA%40mail.gmail.com

3. Creating a DSA area to provide work space for parallel execution
https://www.postgresql.org/message-id/CAEepm%3D0HmRefi1%2BxDJ99Gj5APHr8Qr05KZtAxrMj8b%2Bay3o6sA%40mail.gmail.com

4. Hash table in dynamic shared memory (DHT) [1]
https://www.postgresql.org/message-id/CAEepm%3D0VrMt3s_REDhQv6z1pHL7FETOD7Rt9V2MQ3r-2ss2ccA%40mail.gmail.com

Order in which patches should be applied:
--------------------------------------------------------
1. conditional_variable
2. dsa_area
3. Creating a DSA area to provide work space for parallel execution
4. Hash table in dynamic shared memory.
5. dht-return-dsa-v1.patch
6. parallel-bitmap-heap-scan-v1.patch

Performance Results:
-----------------------------
Summary :
1. After this patch, I observed currently 4 queries are getting
significant improvement (Q4, Q6, Q14, Q15).
- Q4, is converting from parallel seqscan to parallel bitmap heap scan.
- Other queries are converted from a regular bitmap heap scan to a
parallel bitmap heap scan.
2. Benefit is more visible at lower workers (upto 4), after that some
of the queries are selecting ParallelSeqScan over ParallelBitmapScan.
And, I think this is expected, because so far we have only made
BitmapHeap node as parallel whereas ParallelSeqScan is completely
parallel so at higher worker count ParallelSeqScan is better choice.
3. Detailed result is attached @TPCH_PBMS.pdf
4. Explain analyse output is attached @TPCH_plan.tar.gz (for all
changed queries at worker 2)

TPCH query plan changed example (TPCH Q6):
----------------------------------------------------------------
On Head:
-------------

QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1558475.95..1558475.96 rows=1 width=32) (actual
time=40921.437..40921.438 rows=1 loops=1)
-> Aggregate (cost=1558475.95..1558475.96 rows=1 width=32)
(actual time=40921.435..40921.435 rows=1 loops=1)
-> Bitmap Heap Scan on lineitem (cost=291783.32..1552956.39
rows=1103911 width=12) (actual time=7032.075..38997.369 rows=1140434
loops=1)
Recheck Cond: ((l_shipdate >= '1994-01-01'::date) AND
(l_shipdate < '1995-01-01 00:00:00'::timestamp without time zone) AND
(l_discount >= 0.01) AND (l_discount <= 0.03) AND (l_quantity <
'24'::numeric))
Rows Removed by Index Recheck: 25284232
Heap Blocks: exact=134904 lossy=530579
-> Bitmap Index Scan on idx_lineitem_shipdate
(cost=0.00..291507.35 rows=1103911 width=0) (actual
time=6951.408..6951.408 rows=1140434 loops=1)
Index Cond: ((l_shipdate >= '1994-01-01'::date)
AND (l_shipdate < '1995-01-01 00:00:00'::timestamp without time zone)
AND (l_discount >= 0.01) AND (l_discount <= 0.03) AND (l_quantity <
'24'::numeric))
Planning time: 1.126 ms
Execution time: 40922.569 ms
(10 rows)

After Patch:
----------------

QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1541767.60..1541767.61 rows=1 width=32) (actual
time=21895.008..21895.009 rows=1 loops=1)
-> Finalize Aggregate (cost=1541767.60..1541767.61 rows=1
width=32) (actual time=21895.006..21895.006 rows=1 loops=1)
-> Gather (cost=1541767.38..1541767.59 rows=2 width=32)
(actual time=21894.341..21894.970 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=1540767.38..1540767.39
rows=1 width=32) (actual time=21890.990..21890.990 rows=1 loops=3)
-> Parallel Bitmap Heap Scan on lineitem
(cost=291783.32..1538467.56 rows=459963 width=12) (actual
time=8517.126..21215.469 rows=380145 loops=3)
Recheck Cond: ((l_shipdate >=
'1994-01-01'::date) AND (l_shipdate < '1995-01-01 00:00:00'::timestamp
without time zone) AND (l_discount >= 0.01) AND (l_discount <= 0.03)
AND (l_quantity < '24'::numeric))
Rows Removed by Index Recheck: 8427921
Heap Blocks: exact=47761 lossy=187096
-> Bitmap Index Scan on
idx_lineitem_shipdate (cost=0.00..291507.35 rows=1103911 width=0)
(actual time=8307.291..8307.291 rows=1140434 loops=1)
Index Cond: ((l_shipdate >=
'1994-01-01'::date) AND (l_shipdate < '1995-01-01 00:00:00'::timestamp
without time zone) AND (l_discount >= 0.01) AND (l_discount <= 0.03)
AND (l_quantity < '24'::numeric))
Planning time: 1.173 ms
Execution time: 21915.931 ms
(14 rows)

* Thanks to Robert Haas and Amit Kapila, for helping in design review
(off list) and many valuable inputs.
* Thanks to Thomas Munro for DSA and DHT work on which my patch is based on.
* Thanks to Rafia sabih for helping with performance test.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
benchmark_machine_info.txt text/plain 607 bytes
dht-return-dsa-v1.patch application/octet-stream 3.6 KB
parallel-bitmap-heap-scan-v1.patch application/octet-stream 72.3 KB
TPCH_PBMS.pdf application/pdf 38.0 KB
TPCH_Plan.tar.gz application/x-gzip 3.4 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2016-10-07 07:25:56 Re: Transactions involving multiple postgres foreign servers
Previous Message Amit Langote 2016-10-07 04:19:28 Re: pg_dump getBlobs query broken for 7.3 servers