Quick Links

Re: [Proposal] Table partition + join pushdown

From:	Taiki Kondo <tai-kondo(at)yk(dot)jp(dot)nec(dot)com>
To:	Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>
Cc:	Akio Iwaasa <aki-iwaasa(at)vt(dot)jp(dot)nec(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [Proposal] Table partition + join pushdown
Date:	2015-09-24 11:06:27
Message-ID:	12A9442FBAE80D4E8953883E0B84E0885E6AA9@BPXM01GP.gisp.nec.co.jp
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hello, KaiGai-san.

Thank you for your comment, and sorry for late response.

The attached patch is completely rewritten from previous patch[1], at your suggestion[2].
Please find attached.

This patch contains following implementation, but I can't determine this is correct or wrong.

1. Cost estimation
In this patch, additional row filter is implemented for Hash, Merge join and Nested Loop.
I implemented cost estimation feature for this filter by watching other parts of filters,
but I am not sure this implementation is correct.

2. Workaround for failing assertion at allpaths.c
In standard_join_search(), we expect to have a single rel at the final level.
But this expectation is disappointed by join pushdown feature, because this will
search for the combinations not searched by original standard_join_serch()
at the final level. Therefore, once trying join pushdown is succeeded,
failing assertion occurs in allpaths.c.

So I implemented workaround by temporary set NULL to root->join_rel_level while
trying join pushdown, but I think this implementation may be wrong.

3. Searching pathkeys for Merge Join
When join pushdown feature chooses merge join for pushed-down join operation,
planner fails to create merge join node because it is unable to find pathkeys
for this merge join. I found this is caused by skipping child table in finding
pathkeys.

I expect that it is for making planner faster, and I implemented that
planner doesn't skip child table in finding pathkeys for merge join.
But I am not sure this expectation is correct.

Any comments/suggestions are welcome.

Remarks :
[1] http://www.postgresql.org/message-id/12A9442FBAE80D4E8953883E0B84E0885C01FD@BPXM01GP.gisp.nec.co.jp
[2] http://www.postgresql.org/message-id/9A28C8860F777E439AA12E8AEA7694F8011345B6@BPXM15GP.gisp.nec.co.jp

Best regards,
--
Taiki Kondo

NEC Solution Innovators, Ltd.

-----Original Message-----
From: Kaigai Kouhei(海外浩平) [mailto:kaigai(at)ak(dot)jp(dot)nec(dot)com]
Sent: Tuesday, August 18, 2015 5:47 PM
To: Kondo Taiki(近藤太樹); pgsql-hackers(at)postgresql(dot)org
Cc: Iwaasa Akio(岩浅晃郎)
Subject: RE: [Proposal] Table partition + join pushdown

Hello Kondo-san,

I briefly checked your patch. Let me put some comments about its design and implementation, even though I have no arguments towards its concept. :-)

* Construction of RelOptInfo

In your patch, try_hashjoin_pushdown() called by try_hashjoin_path() constructs RelOptInfo of the join-rel between inner-rel and a subpath of Append node. It is entirely wrong implementation.

I can understand we (may) have no RelOptInfo for the joinrel between
tbl_child_0 and other_table, when planner investigates a joinpath to process join Append path with the other_table.

> HashJoin
> -> Append
> -> SeqScan on tbl_child_0
> -> SeqScan on tbl_child_1
> -> SeqScan on tbl_child_2
> -> SeqScan on tbl_child_3
> -> Hash
> -> SeqScan on other_table
>
How about these alternatives?

- calls make_join_rel() to the pair of tbl_child_X and other_table
by try_hashjoin_pushdown() or somewhere. make_join_rel() internally
construct a RelOptInfo for the supplied pair of relations, so
relevant RelOptInfo shall be properly constructed.
- make_join_rel() also calls add_paths_to_joinrel() towards all the
join logic, so it makes easier to support to push down other join
logic including nested-loop or custom-join.
- It may be an idea to add an extra argument to make_join_rel() to
inform expressions to be applied for tuple filtering on
construction of inner hash table.

* Why only SeqScan is supported

I think it is the role of Hash-node to filter out inner tuples obviously unrelated to the join (if CHECK constraint of outer relation gives information), because this join-pushdown may be able to support multi-stacked pushdown.

For example, if planner considers a path to join this Append-path with another relation, and join clause contains a reference to X?

> Append
> -> HashJoin
> -> SeqScan on tbl_child_0
> -> Hash ... Filter: hash_func(X) % 4 = 0
> -> SeqScan on other_table
> -> HashJoin
> -> SeqScan on tbl_child_1
> -> Hash ... Filter: hash_func(X) % 4 = 1
> -> SeqScan on other_table
> -> HashJoin
> -> SeqScan on tbl_child_2
> -> Hash ... Filter: hash_func(X) % 4 = 2
> -> SeqScan on other_table
> -> HashJoin
> -> SeqScan on tbl_child_3
> -> Hash ... Filter: hash_func(X) % 4 = 3
> -> SeqScan on other_table
>
It may be a good challenge to consider additional join pushdown, even if subpaths of Append are HashJoin, not SeqScan, like:

Append
-> HashJoin
-> HashJoin
-> SeqScan on tbl_child_0
-> Hash ... Filter: hash_func(X) % 4 = 0
-> SeqScan on other_table
-> Hash ... Filter: hash_func(X) % 4 = 0
-> SeqScan on another_table
-> HashJoin
-> HashJoin
-> SeqScan on tbl_child_1
-> Hash ... Filter: hash_func(X) % 4 = 1
-> SeqScan on other_table
-> Hash ... Filter: hash_func(X) % 4 = 1
-> SeqScan on another_table
-> HashJoin
-> HashJoin
-> SeqScan on tbl_child_2
-> Hash ... Filter: hash_func(X) % 4 = 2
-> SeqScan on other_table
-> Hash ... Filter: hash_func(X) % 4 = 2
-> SeqScan on another_table
-> HashJoin
-> HashJoin
-> SeqScan on tbl_child_3
-> Hash ... Filter: hash_func(X) % 4 = 3
-> SeqScan on other_table
-> Hash ... Filter: hash_func(X) % 4 = 3
-> SeqScan on another_table

In this case, underlying nodes are not always SeqScan. So, only Hash-node can have filter clauses.

* Way to handle expression nodes

All this patch supported is CHECK() constraint with equal operation on INT4 data type. You can learn various useful infrastructure of PostgreSQL. For example, ...
- expression_tree_mutator() is useful to make a copy of expression
node with small modification
- pull_varnos() is useful to check which relations are referenced
by the expression node.
- RestrictInfo->can_join is useful to check whether the clause is
binary operator, or not.

Anyway, reuse of existing infrastructure is the best way to make a reliable infrastructure and to keep the implementation simple.

Thanks,
--
NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>

> -----Original Message-----
> From: pgsql-hackers-owner(at)postgresql(dot)org
> [mailto:pgsql-hackers-owner(at)postgresql(dot)org] On Behalf Of Taiki Kondo
> Sent: Thursday, August 13, 2015 6:30 PM
> To: pgsql-hackers(at)postgresql(dot)org
> Cc: Kaigai Kouhei(海外浩平); Iwaasa Akio(岩浅晃郎)
> Subject: [HACKERS] [Proposal] Table partition + join pushdown
>
> Hi all,
>
> I saw the email about the idea from KaiGai-san[1], and I worked to
> implement this idea.
>
> Now, I have implemented a part of this idea, so I want to propose this
> feature.
>
> Patch attached just shows my concept of this feature.
> It works fine for EXPLAIN, but it returns wrong result for other operations, sadly.
>
>
>
> Table partition + join pushdown
> ===============================
>
> Motivation
> ----------
> To make join logic working more effectively, it is important to make
> the size of relations smaller.
>
> Especially in Hash-join, it is meaningful to make the inner relation
> smaller, because smaller inner relation can be stored within smaller hash table.
> This means that memory usage can be reduced when joining with big tables.
>
>
> Design
> ------
> It was mentioned by the email from KaiGai-san.
> So I quote below here...
>
> ---- begin quotation ---
> Let's assume a table which is partitioned to four portions, and
> individual child relations have constraint by hash-value of its ID
> field.
>
> tbl_parent
> + tbl_child_0 ... CHECK(hash_func(id) % 4 = 0)
> + tbl_child_1 ... CHECK(hash_func(id) % 4 = 1)
> + tbl_child_2 ... CHECK(hash_func(id) % 4 = 2)
> + tbl_child_3 ... CHECK(hash_func(id) % 4 = 3)
>
> If someone tried to join another relation with tbl_parent using
> equivalence condition, like X = tbl_parent.ID, we know inner tuples
> that does not satisfies the condition
> hash_func(X) % 4 = 0
> shall be never joined to the tuples in tbl_child_0.
> So, we can omit to load these tuples to inner hash table preliminary,
> then it potentially allows to split the inner hash-table.
>
> Current typical plan structure is below:
>
> HashJoin
> -> Append
> -> SeqScan on tbl_child_0
> -> SeqScan on tbl_child_1
> -> SeqScan on tbl_child_2
> -> SeqScan on tbl_child_3
> -> Hash
> -> SeqScan on other_table
>
> It may be rewritable to:
>
> Append
> -> HashJoin
> -> SeqScan on tbl_child_0
> -> Hash ... Filter: hash_func(X) % 4 = 0
> -> SeqScan on other_table
> -> HashJoin
> -> SeqScan on tbl_child_1
> -> Hash ... Filter: hash_func(X) % 4 = 1
> -> SeqScan on other_table
> -> HashJoin
> -> SeqScan on tbl_child_2
> -> Hash ... Filter: hash_func(X) % 4 = 2
> -> SeqScan on other_table
> -> HashJoin
> -> SeqScan on tbl_child_3
> -> Hash ... Filter: hash_func(X) % 4 = 3
> -> SeqScan on other_table
> ---- end quotation ---
>
> In the quotation above, it was written that filter is set at Hash node.
> But I implemented that filter is set at SeqScan node under Hash node.
> In my opinion, filtering tuples is work of Scanner.
>
> Append
> -> HashJoin
> -> SeqScan on tbl_child_0
> -> Hash
> -> SeqScan on other_table ... Filter: hash_func(X) % 4 = 0
> -> HashJoin
> -> SeqScan on tbl_child_1
> -> Hash
> -> SeqScan on other_table ... Filter: hash_func(X) % 4 = 1
> -> HashJoin
> -> SeqScan on tbl_child_2
> -> Hash
> -> SeqScan on other_table ... Filter: hash_func(X) % 4 = 2
> -> HashJoin
> -> SeqScan on tbl_child_3
> -> Hash
> -> SeqScan on other_table ... Filter: hash_func(X) % 4 = 3
>
>
> API
> ---
> There are 3 new internal (static) functions to implement this feature.
> try_hashjoin_pushdown(), which is main function of this feature, is
> called from try_hashjoin_path(), and tries to push down HashPath under
> the AppendPath.
>
> To do so, this function does following operations.
>
> 1. Check if this Hash-join can be pushed down under AppendPath
> 2. To avoid an influence on other Path making operation,
> copy inner path's RelOptInfo and make new SeqScan path from it.
> At here, get CHECK() constraints from OUTER path, and convert its
> Var node according to join condition. And also convert Var nodes
> in join condition itself.
> 3. Create new HashPath nodes between each sub-paths of AppendPath and
> inner path made above.
> 4. When operations from 1 to 3 are done for each sub-paths,
> create new AppendPath which sub-paths are HashPath nodes made above.
>
> get_replaced_clause_constr() is called from try_hashjoin_pushdown(),
> and get_var_nodes_recurse() is called from get_replaced_cluase_constr().
> These 2 functions help above operations.
> (I may revise this part to use expression_tree_walker() and
> expression_tree_mutator().)
>
> In patch attached, it has the following limitations.
> o It only works for hash-join operation.
> (I want to support not only hash-join but also other logic.)
> o Join conditions must be "=" operator with int4 variables.
> o Inner path must be SeqScan.
> (I want to support other path-node.)
> o And now, planner may not choose this plan,
> because estimated costs are usually larger than original (non-pushdown) plan.
>
> And also 1 internal (static) function, get_relation_constraints()
> defined in plancat.c, is changed to global. This function will be
> called from
> get_replaced_clause_constr() to get CHECK() constraints.
>
>
> Usage
> -----
> To use this feature, create partition tables and small table to join,
> and run select operation with joining these tables.
>
> For your convenience, I attach DDL and DML script.
> And I also attach the result of EXPLAIN.
>
>
> Any comments are welcome. But, at first, I need your advices to
> correct this patch's behavior.
>
> At least, I think it has to expand array of RangeTblEntry and other
> arrays defined in PlannerInfo to register new RelOptInfos for new Path nodes mentioned above.
> Or, is it better choice to modify query parser to implement this
> feature more further?
>
>
> Remarks :
> [1]
> http://www.postgresql.org/message-id/9A28C8860F777E439AA12E8AEA7694F80
> 10F672
> B(at)BPXM15GP(dot)gisp(dot)nec(dot)co(dot)jp
>
>
> Best regards,
> --
> Taiki Kondo
>
> NEC Solution Innovators, Ltd.

Attachment	Content-Type	Size
join_pushdown.v1.patch	application/octet-stream	47.8 KB

In response to

Re: [Proposal] Table partition + join pushdown at 2015-08-18 08:47:09 from Kouhei Kaigai

Responses

Re: [Proposal] Table partition + join pushdown at 2015-09-29 02:45:47 from Kouhei Kaigai

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Stephen Frost	2015-09-24 11:25:31	Re: No Issue Tracker - Say it Ain't So!
Previous Message	Gavin Flower	2015-09-24 10:52:49	Re: [COMMITTERS] pgsql: Use gender-neutral language in documentation