[Proposal] Table partition + join pushdown

From: Taiki Kondo <tai-kondo(at)yk(dot)jp(dot)nec(dot)com>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Cc: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Akio Iwaasa <aki-iwaasa(at)vt(dot)jp(dot)nec(dot)com>
Subject: [Proposal] Table partition + join pushdown
Date: 2015-08-13 09:29:34
Message-ID: 12A9442FBAE80D4E8953883E0B84E0885C01FD@BPXM01GP.gisp.nec.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi all,

I saw the email about the idea from KaiGai-san[1],
and I worked to implement this idea.

Now, I have implemented a part of this idea,
so I want to propose this feature.

Patch attached just shows my concept of this feature.
It works fine for EXPLAIN, but it returns wrong result for other operations, sadly.

Table partition + join pushdown
===============================

Motivation
----------
To make join logic working more effectively,
it is important to make the size of relations smaller.

Especially in Hash-join, it is meaningful to make the inner relation smaller,
because smaller inner relation can be stored within smaller hash table.
This means that memory usage can be reduced when joining with big tables.

Design
------
It was mentioned by the email from KaiGai-san.
So I quote below here...

---- begin quotation ---
Let's assume a table which is partitioned to four portions,
and individual child relations have constraint by hash-value
of its ID field.

tbl_parent
+ tbl_child_0 ... CHECK(hash_func(id) % 4 = 0)
+ tbl_child_1 ... CHECK(hash_func(id) % 4 = 1)
+ tbl_child_2 ... CHECK(hash_func(id) % 4 = 2)
+ tbl_child_3 ... CHECK(hash_func(id) % 4 = 3)

If someone tried to join another relation with tbl_parent
using equivalence condition, like X = tbl_parent.ID, we
know inner tuples that does not satisfies the condition
hash_func(X) % 4 = 0
shall be never joined to the tuples in tbl_child_0.
So, we can omit to load these tuples to inner hash table
preliminary, then it potentially allows to split the
inner hash-table.

Current typical plan structure is below:

HashJoin
-> Append
-> SeqScan on tbl_child_0
-> SeqScan on tbl_child_1
-> SeqScan on tbl_child_2
-> SeqScan on tbl_child_3
-> Hash
-> SeqScan on other_table

It may be rewritable to:

Append
-> HashJoin
-> SeqScan on tbl_child_0
-> Hash ... Filter: hash_func(X) % 4 = 0
-> SeqScan on other_table
-> HashJoin
-> SeqScan on tbl_child_1
-> Hash ... Filter: hash_func(X) % 4 = 1
-> SeqScan on other_table
-> HashJoin
-> SeqScan on tbl_child_2
-> Hash ... Filter: hash_func(X) % 4 = 2
-> SeqScan on other_table
-> HashJoin
-> SeqScan on tbl_child_3
-> Hash ... Filter: hash_func(X) % 4 = 3
-> SeqScan on other_table
---- end quotation ---

In the quotation above, it was written that filter is set at Hash node.
But I implemented that filter is set at SeqScan node under Hash node.
In my opinion, filtering tuples is work of Scanner.

Append
-> HashJoin
-> SeqScan on tbl_child_0
-> Hash
-> SeqScan on other_table ... Filter: hash_func(X) % 4 = 0
-> HashJoin
-> SeqScan on tbl_child_1
-> Hash
-> SeqScan on other_table ... Filter: hash_func(X) % 4 = 1
-> HashJoin
-> SeqScan on tbl_child_2
-> Hash
-> SeqScan on other_table ... Filter: hash_func(X) % 4 = 2
-> HashJoin
-> SeqScan on tbl_child_3
-> Hash
-> SeqScan on other_table ... Filter: hash_func(X) % 4 = 3

API
---
There are 3 new internal (static) functions to implement this feature.
try_hashjoin_pushdown(), which is main function of this feature,
is called from try_hashjoin_path(), and tries to push down HashPath
under the AppendPath.

To do so, this function does following operations.

1. Check if this Hash-join can be pushed down under AppendPath
2. To avoid an influence on other Path making operation,
copy inner path's RelOptInfo and make new SeqScan path from it.
At here, get CHECK() constraints from OUTER path, and convert its
Var node according to join condition. And also convert Var nodes
in join condition itself.
3. Create new HashPath nodes between each sub-paths of AppendPath and
inner path made above.
4. When operations from 1 to 3 are done for each sub-paths,
create new AppendPath which sub-paths are HashPath nodes made above.

get_replaced_clause_constr() is called from try_hashjoin_pushdown(),
and get_var_nodes_recurse() is called from get_replaced_cluase_constr().
These 2 functions help above operations.
(I may revise this part to use expression_tree_walker() and
expression_tree_mutator().)

In patch attached, it has the following limitations.
o It only works for hash-join operation.
(I want to support not only hash-join but also other logic.)
o Join conditions must be "=" operator with int4 variables.
o Inner path must be SeqScan.
(I want to support other path-node.)
o And now, planner may not choose this plan,
because estimated costs are usually larger than original (non-pushdown) plan.

And also 1 internal (static) function, get_relation_constraints() defined in
plancat.c, is changed to global. This function will be called from
get_replaced_clause_constr() to get CHECK() constraints.

Usage
-----
To use this feature, create partition tables and small table to join,
and run select operation with joining these tables.

For your convenience, I attach DDL and DML script.
And I also attach the result of EXPLAIN.

Any comments are welcome. But, at first, I need your advices
to correct this patch's behavior.

At least, I think it has to expand array of RangeTblEntry and other arrays defined
in PlannerInfo to register new RelOptInfos for new Path nodes mentioned above.
Or, is it better choice to modify query parser to implement this feature more further?

Remarks :
[1] http://www.postgresql.org/message-id/9A28C8860F777E439AA12E8AEA7694F8010F672B@BPXM15GP.gisp.nec.co.jp

Best regards,
--
Taiki Kondo

NEC Solution Innovators, Ltd.

Attachment Content-Type Size
EXPLAIN (Original).txt text/plain 963 bytes
EXPLAIN (Pushdown).txt text/plain 2.8 KB
hashjoin_pushdown.v0.patch application/octet-stream 14.1 KB
test_queries.sql application/octet-stream 1.2 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2015-08-13 09:38:37 Re: [COMMITTERS] pgsql: Close some holes in BRIN page assignment
Previous Message Tatsuo Ishii 2015-08-13 09:03:50 Views created by UNION ALL