Proposal : Parallel Merge Join

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Proposal : Parallel Merge Join
Date: 2016-12-10 12:59:11
Message-ID: CAFiTN-v3=cM6nyFwFGp0fmvY4=kk79Hq9Fgu0u8CSJ-EEq1Tiw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I would like to propose a patch for parallelizing merge join path.
This idea is derived by analyzing TPCH results.

I have done this analysis along with my colleagues Rafia sabih and Amit kaplia.

Currently we already have infrastructure for executing parallel join,
so we don't need any change at executor level. Only in path
generation, we need to add partial paths for merge join, like we do
for nest loop and hash join.

After this patch, we will get some new type of paths as shown below.

-> Gather
-> MergeJoin
-> Sort
-> Partial Seq Scan
-> Any parallel safe inner path
or
-> Gather
-> MergeJoin
-> Partial Index Scan path (or any sorted partial path)
-> Any parallel safe inner path

Major benefit of this patch is, it can be helpful in converting some
paths to parallel paths where they were not using parallelism at all,
may be because we need merge join at intermediate level.

Performance:
------------------
- I applied this patch on head and tested TPCH (as of now only with
scale factor 10). And, observed that Q20 is giving 2x gain. On head,
this query was not using parallelism at all, but after parallel merge,
it's able to use parallelism all the way up to top level join (explain
analyze result is attached).

- I need to test this with different scale factor and also with other
patches like parallel index scan, gather merge and I hope to see more
paths getting benefited.

* 'gather merge' will be useful where we expect sorted path from merge
join, because parallel merge join will not give sorted result.

Test configuration and machine detail:
--------------------------------------------------
TPCH: scale factor : 10
work_mem : 100MB
shared buffer : 1GB
max_parallel_workers_per_gather : 3
Machine : POWER 4 socket machine.

Attached file:
------------------
1. parallel_mergejooin_v1.patch : parallel merge join patch
2. 20_head.out : Explain analyze output on head (median of 3 runs)
3. 20_patch.out : Explain analyze output with patch (median of 3 runs)

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
20_head.out application/octet-stream 4.7 KB
20_patch.out application/octet-stream 5.6 KB
parallel_mergejoin_v1.patch application/octet-stream 23.4 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Janes 2016-12-10 18:29:05 Re: postgres_fdw bug in 9.6
Previous Message Amit Kapila 2016-12-10 12:06:17 Re: Parallel bitmap heap scan