Parallel Index Scans

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Parallel Index Scans
Date: 2016-10-13 03:18:19
Message-ID: CAA4eK1J9GQPN0UaXKX8XnW4M-OeUFja4+ySt4=7FfRJBUxZ9vQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

As of now, the driving table for parallel query is accessed by
parallel sequential scan which limits its usage to a certain degree.
Parallelising index scans would further increase the usage of parallel
query in many more cases. This patch enables the parallelism for the
btree scans. Supporting parallel index scan for other index types
like hash, gist, spgist can be done as separate patches.

The basic idea is quite similar to parallel heap scans which is that
each worker (including leader whenever possible) will scan a block and
then get the next block that is required to be scan. The parallelism
in implemented at the leaf level of a btree. The first worker to
start a btree scan will scan till leaf and others will wait till the
first worker has reached till leaf. The first worker after reading
the leaf block will set the next block to be read and wake the first
worker waiting to scan the next block and proceed with scanning tuples
from the block it has read, similarly each worker after reading the
block, sets the next block to be read and wakes up the first waiting
worker. This is achieved by using the condition variable patch [1]
proposed by Robert. Parallelism is supported for both forward and
backward scans.

The optimizer will choose the parallelism based on number of pages in
index relation and cpu cost for evaluating the rows is divided equally
among workers. Index Scan node is made parallel aware and can be used
beneath Gather as shown below:

Current Plan for Index Scans
----------------------------------------
Index Scan using idx2 on test (cost=0.42..7378.96 rows=2433 width=29)
Index Cond: (c < 10)

Parallel version of plan
----------------------------------
Gather (cost=1000.42..1243.40 rows=2433 width=29)
Workers Planned: 1
-> Parallel Index Scan using idx2 on test (cost=0.42..0.10
rows=1431 width=29)
Index Cond: (c < 10)

The Parallel index scans can be used in parallelising aggregate
queries as well. For example, given a query like: select count(*)
from t1 where c1 > 1000 and c1 < 1100 and c2='aaa' Group By c2; below
form of parallel plans are possible:

Finalize HashAggregate
Group Key: c2
-> Gather
Workers Planned: 1
-> Partial HashAggregate
Group Key: c2
-> Parallel Index Scan using idx_t1_partial on t1
Index Cond: ((c1 > 1000) AND (c1 < 1100))
Filter: (c2 = 'aaa'::bpchar)

OR

Finalize GroupAggregate
Group Key: c2
-> Sort
-> Gather
Workers Planned: 1
-> Partial GroupAggregate
Group Key: c2
-> Parallel Index Scan using idx_t1_partial on t1
Index Cond: ((c1 > 1000) AND (c1 < 1100))
Filter: (c2 = 'aaa'::bpchar)

In the second plan (GroupAggregate), the Sort + Gather step would be
replaced with GatherMerge, once we have a GatherMerge node as proposed
by Rushabh [2]. Note, that above examples are just taken to explain
the usage of parallel index scan, actual plans will be selected based
on cost.

Performance tests
----------------------------
This test has been performed on community m/c (hydra, POWER-7).

Initialize pgbench with 3000 scale factor (./pgbench -i -s 3000 postgres)

Count the rows in pgbench_accounts based on values of aid and bid

Serial plan
------------------
set max_parallel_workers_per_gather=0;

postgres=# explain analyze select count(aid) from pgbench_accounts
where aid > 1000 and aid < 90000000 and bid > 800 and bid < 900;

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------------
----
Aggregate (cost=4714590.52..4714590.53 rows=1 width=8) (actual
time=35684.425..35684.425 rows=1 loops=1)
-> Index Scan using pgbench_accounts_pkey on pgbench_accounts
(cost=0.57..4707458.12 rows=2852961 width=4) (actual
time=29210.743..34385.271 rows=9900000 loops
=1)
Index Cond: ((aid > 1000) AND (aid < 90000000))
Filter: ((bid > 800) AND (bid < 900))
Rows Removed by Filter: 80098999
Planning time: 0.183 ms
Execution time: 35684.459 ms
(7 rows)

Parallel Plan
-------------------
set max_parallel_workers_per_gather=2;

postgres=# explain analyze select count(aid) from pgbench_accounts
where aid > 1000 and aid < 90000000 and bid > 800 and bid < 900;

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------
Finalize Aggregate (cost=3924773.13..3924773.14 rows=1 width=8)
(actual time=15033.105..15033.105 rows=1 loops=1)
-> Gather (cost=3924772.92..3924773.12 rows=2 width=8) (actual
time=15032.986..15033.093 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=3923772.92..3923772.92 rows=1
width=8) (actual time=15030.354..15030.354 rows=1 loops=3)
-> Parallel Index Scan using pgbench_accounts_pkey on
pgbench_accounts (cost=0.57..3920801.08 rows=1188734 width=4) (actual
time=12476.068..14600.410 rows=3300000 loops=3)
Index Cond: ((aid > 1000) AND (aid < 90000000))
Filter: ((bid > 800) AND (bid < 900))
Rows Removed by Filter: 26699666
Planning time: 0.244 ms
Execution time: 15036.081 ms
(11 rows)

The above is a median of 3 runs, all the runs gave almost same
execution time. Here, we can notice that execution time is reduced by
more than half with two workers and I have tested with four workers
where time is reduced to one-fourth (9128.420 ms) of serial plan. I
think these results are quite similar to what we got for parallel
sequential scans. Another thing to note is that parallelising index
scans are more beneficial if there is a Filter which removes many rows
fetched from Index Scan or if the Filter is costly (example - filter
contains costly function execution). This observation is also quite
similar to what we have observed with Parallel Sequential Scans.

I think we can parallelise Index Only Scans as well, but I have not
evaluated the same and certainly it can be done as a separate patch in
future.

Contributions
--------------------
First patch (parallel_index_scan_v1.patch) implements parallelism at
IndexAM level - Rahila Syed and Amit Kapila based on design inputs and
suggestions by Robert Haas
Second patch (parallel_index_opt_exec_support_v1.patch) provides
optimizer and executor support for parallel index scans - Amit Kapila

The order to use these patches is first apply condition variable patch
[1] then parallel_index_scan_v1.patch and then
parallel_index_opt_exec_support_v1.patch

Thoughts?

[1] - https://www.postgresql.org/message-id/CAEepm%3D0zshYwB6wDeJCkrRJeoBM%3DjPYBe%2B-k_VtKRU_8zMLEfA%40mail.gmail.com
[2] - https://www.postgresql.org/message-id/CAGPqQf09oPX-cQRpBKS0Gq49Z%2Bm6KBxgxd_p9gX8CKk_d75HoQ%40mail.gmail.com

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
parallel_index_scan_v1.patch application/octet-stream 47.2 KB
parallel_index_opt_exec_support_v1.patch application/octet-stream 25.7 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2016-10-13 03:43:54 Re: Non-empty default log_line_prefix
Previous Message Michael Paquier 2016-10-13 03:01:31 Re: parallel.sgml