Re: [Proposal] Table partition + join pushdown

From: Taiki Kondo <tai-kondo(at)yk(dot)jp(dot)nec(dot)com>
To: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, "Hiroshi Yanagisawa" <hir-yanagisawa(at)ut(dot)jp(dot)nec(dot)com>
Subject: Re: [Proposal] Table partition + join pushdown
Date: 2015-10-21 11:07:03
Message-ID: 12A9442FBAE80D4E8953883E0B84E088606D16@BPXM01GP.gisp.nec.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello, KaiGai-san and Horiguchi-san.

I created v2 patch. Please find attached.
I believe this patch will fix the most of issues mentioned by
Horiguchi-san except naming.

In this v2 patch, scan node which is originally inner relation of
Join node must be SeqScan (or SampleScan). This limitation is
due to implementation of try_join_pushdown(), which copies Path nodes
to attach new filtering conditions converted from CHECK() constraints.

It uses copyObject() for this purpose, so I must implement copy functions
for scan Path nodes like IndexPath, BitmapHeapPath, TidPath and so on.

By the way, let me introduce the performance of this feature.
Here are the results I tested in my environment.
These results were got by "pushdown_test.v1.large.sql"
running on the environment that "work_mem" set to "1536kB".
(This file is also attached in this mail.)

[Normal]
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=1851.02..14638.11 rows=300004 width=20) (actual time=88.188..453.926 rows=299992 loops=1)
Hash Cond: (check_test_div.id = inner_t.id)
-> Append (cost=0.00..4911.03 rows=300004 width=20) (actual time=0.089..133.456 rows=300003 loops=1)
-> Seq Scan on check_test_div (cost=0.00..0.00 rows=1 width=20) (actual time=0.003..0.003 rows=0 loops=1)
-> Seq Scan on check_test_div_0 (cost=0.00..1637.01 rows=100001 width=20) (actual time=0.085..40.741 rows=100001 loops=1)
-> Seq Scan on check_test_div_1 (cost=0.00..1637.01 rows=100001 width=20) (actual time=0.023..29.213 rows=100001 loops=1)
-> Seq Scan on check_test_div_2 (cost=0.00..1637.01 rows=100001 width=20) (actual time=0.021..28.592 rows=100001 loops=1)
-> Hash (cost=866.01..866.01 rows=60001 width=8) (actual time=87.970..87.970 rows=60001 loops=1)
Buckets: 32768 Batches: 2 Memory Usage: 1446kB
-> Seq Scan on inner_t (cost=0.00..866.01 rows=60001 width=8) (actual time=0.030..39.133 rows=60001 loops=1)
Planning time: 0.867 ms
Execution time: 470.269 ms
(12 rows)

[With this feature]
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
Append (cost=0.01..10651.37 rows=300004 width=20) (actual time=55.548..377.615 rows=299992 loops=1)
-> Hash Join (cost=0.01..1091.04 rows=1 width=20) (actual time=0.017..0.017 rows=0 loops=1)
Hash Cond: (inner_t.id = check_test_div.id)
-> Seq Scan on inner_t (cost=0.00..866.01 rows=60001 width=8) (never executed)
-> Hash (cost=0.00..0.00 rows=1 width=20) (actual time=0.003..0.003 rows=0 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 8kB
-> Seq Scan on check_test_div (cost=0.00..0.00 rows=1 width=20) (actual time=0.002..0.002 rows=0 loops=1)
-> Hash Join (cost=1169.76..3186.78 rows=100001 width=20) (actual time=55.530..149.205 rows=100001 loops=1)
Hash Cond: (check_test_div_0.id = inner_t.id)
-> Seq Scan on check_test_div_0 (cost=0.00..1637.01 rows=100001 width=20) (actual time=0.058..34.268 rows=100001 loops=1)
-> Hash (cost=1166.01..1166.01 rows=300 width=8) (actual time=55.453..55.453 rows=20001 loops=1)
Buckets: 32768 (originally 1024) Batches: 1 (originally 1) Memory Usage: 1038kB
-> Seq Scan on inner_t (cost=0.00..1166.01 rows=300 width=8) (actual time=0.031..43.590 rows=20001 loops=1)
Filter: ((id % 3) = 0)
Rows Removed by Filter: 40000
-> Hash Join (cost=1169.76..3186.78 rows=100001 width=20) (actual time=27.942..97.582 rows=99996 loops=1)
Hash Cond: (check_test_div_1.id = inner_t.id)
-> Seq Scan on check_test_div_1 (cost=0.00..1637.01 rows=100001 width=20) (actual time=0.030..25.514 rows=100001 loops=1)
-> Hash (cost=1166.01..1166.01 rows=300 width=8) (actual time=27.890..27.890 rows=20000 loops=1)
Buckets: 32768 (originally 1024) Batches: 1 (originally 1) Memory Usage: 1038kB
-> Seq Scan on inner_t (cost=0.00..1166.01 rows=300 width=8) (actual time=0.014..21.688 rows=20000 loops=1)
Filter: ((id % 3) = 1)
Rows Removed by Filter: 40001
-> Hash Join (cost=1169.76..3186.78 rows=100001 width=20) (actual time=27.651..97.755 rows=99995 loops=1)
Hash Cond: (check_test_div_2.id = inner_t.id)
-> Seq Scan on check_test_div_2 (cost=0.00..1637.01 rows=100001 width=20) (actual time=0.026..25.620 rows=100001 loops=1)
-> Hash (cost=1166.01..1166.01 rows=300 width=8) (actual time=27.599..27.599 rows=20000 loops=1)
Buckets: 32768 (originally 1024) Batches: 1 (originally 1) Memory Usage: 1038kB
-> Seq Scan on inner_t (cost=0.00..1166.01 rows=300 width=8) (actual time=0.017..21.307 rows=20000 loops=1)
Filter: ((id % 3) = 2)
Rows Removed by Filter: 40001
Planning time: 1.876 ms
Execution time: 394.007 ms
(33 rows)

The value of "Batches" is 2 on Hash node in normal,
but these values are 1 on all Hash nodes in "with this feature".

This means that the hash table is not split because of this feature.

Therefore, PostgreSQL with this feature is faster than the normal one in this case.
(470.269 ms @ normal vs 394.007 ms @ this feature)

I think this point is large benefit of this feature.

Best regards,
--
Taiki Kondo

NEC Solution Innovators, Ltd.

-----Original Message-----
From: Kaigai Kouhei(海外 浩平) [mailto:kaigai(at)ak(dot)jp(dot)nec(dot)com]
Sent: Thursday, October 15, 2015 10:21 AM
To: Kondo Taiki(近藤 太樹); Kyotaro HORIGUCHI
Cc: Iwaasa Akio(岩浅 晃郎); pgsql-hackers(at)postgresql(dot)org
Subject: RE: [HACKERS] [Proposal] Table partition + join pushdown

> -----Original Message-----
> From: pgsql-hackers-owner(at)postgresql(dot)org
> [mailto:pgsql-hackers-owner(at)postgresql(dot)org] On Behalf Of Taiki Kondo
> Sent: Thursday, October 08, 2015 5:28 PM
> To: Kyotaro HORIGUCHI
> Cc: Kaigai Kouhei(海外 浩平); Iwaasa Akio(岩浅 晃郎);
> pgsql-hackers(at)postgresql(dot)org
> Subject: Re: [HACKERS] [Proposal] Table partition + join pushdown
>
> Hello, Horiguchi-san.
>
> Thank you for your comment.
>
> > I got some warning on compilation on unused variables and wrong
> > arguemtn type.
>
> OK, I'll fix it.
>
> > I failed to have a query that this patch works on. Could you let me
> > have some specific example for this patch?
>
> Please find attached.
> And also make sure that setting of work_mem is '64kB' (not 64MB).
>
> If there is the large work_mem enough to create hash table for
> relation after appending, its cost may be better than pushed-down
> plan's cost, then planner will not choose pushed-down plan this patch makes.
> So, to make this patch working fine, work_mem size must be smaller
> than the hash table size for relation after appending.
>
> > This patch needs more comments. Please put comment about not only
> > what it does but also the reason and other things for it.
>
> OK, I'll put more comments in the code.
> But it will take a long time, maybe...
>
People (including me) can help. Even though your English capability is not enough, it is significant to put intention of the code.

> > -- about namings
> >
> > Names for functions and variables are needed to be more appropriate,
> > in other words, needed to be what properly informs what they are.
> > The followings are the examples of such names.
>
> Thank you for your suggestion.
>
> I also think these names are not good much.
> I'll try to make names better , but it maybe take a long time...
> Of course, I will use your suggestion as reference.
>
> > "added_restrictlist"'s widely distributed as many function arguemnts
> > and JoinPathExtraData makes me feel dizzy..
>
> "added_restrictinfo" will be deleted from almost functions other than
> try_join_pushdown() in next (v2) patch because the place of filtering
> using this info will be changed from Join node to Scan node and not
> have to place it into other than try_join_pushdown().
>
This restrictinfo intends to filter out obviously unrelated rows in this join, due to the check constraint of other side of the join.
So, correct but redundant name is:
restrictlist_to_drop_unrelated_rows_because_of_check_constraint

How about 'restrictlist_by_constraint' instead?

> > In make_restrictinfos_from_check_constr, the function returns
> > modified constraint predicates correspond to vars under hashjoinable
> > join clauses. I don't think expression_tree_mutator is suitable to
> > do that since it could allow unwanted result when constraint
> > predicates or join clauses are not simple OpExpr's.
>
> Do you have any example of this situation?
> I am trying to find unwanted results you mentioned, but I don't have
> any results in this timing. I have a hunch that it will allow unwanted
> results because I have thought only about very simple situation for
> this function.
>
check_constraint_mutator makes modified restrictlist with relacing Var node only when join clause is hash-joinable.
It implies <expr> = <expr> form, thus we can safely replace the expression by the other side.

Of course, we still have cases we cannot replace expressions simply.
- If function (or function called by operators) has volatile attribute
(who use volatile function on CHECK constraint of partitioning?)
- If it is uncertain whether expression returns always same result.
(is it possible to contain SubLink in the constraint?)

I'd like to suggest to use white-list approach in this mutator routine.
It means that only immutable expression node are allowed to include the modified restrictlist.

Things to do is:

check_constraint_mutator(...)
{
if (node == NULL)
return NULL;
if (IsA(node, Var))
{
:
}
else if (node is not obviously immutable)
{
context->is_mutated = false; <-- prohibit to make if expression
} contains uncertain node.
return expression_tree_mutator(...)
}

> > Otherwise could you give me clear explanation on what it does?
>
> This function transfers CHECK() constraint to filter expression by
> following procedures.
> (1) Get outer table's CHECK() constraint by using get_relation_constraints().
> (2) Walk through expression tree got in (1) by using expression_tree_mutator()
> with check_constraint_mutator() and change only outer's Var node to
> inner's one according to join clause.
>
> For example, when CHECK() constraint of table A is "num % 4 = 0" and
> join clause between table A and B is "A.num = B.data", then we can get
> "B.data % 4 = 0" for filtering purpose.
>
> This also accepts more complex join clause like "A.num = B.data * 2",
> then we can get "(B.data * 2) % 4 = 0".
>
> In procedure (2), to decide whether to use each join clause for
> changing Var node or not, I implement check_constraint_mutator() to
> judge whether join clause is hash-joinable or not.
>
> Actually, I want to judge whether OpExpr as top expression tree of
> join clause means "=" or not, but I can't find how to do it.
>
> If you know how to do it, please let me know.
>
>
Thanks,
--
NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>

> -----Original Message-----
> From: Kyotaro HORIGUCHI [mailto:horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp]
> Sent: Tuesday, October 06, 2015 8:35 PM
> To: tai-kondo(at)yk(dot)jp(dot)nec(dot)com
> Cc: kaigai(at)ak(dot)jp(dot)nec(dot)com; aki-iwaasa(at)vt(dot)jp(dot)nec(dot)com;
> pgsql-hackers(at)postgresql(dot)org
> Subject: Re: [HACKERS] [Proposal] Table partition + join pushdown
>
> Hello.
>
> I tried to read this and had some random comments on this.
>
> -- general
>
> I got some warning on compilation on unused variables and wrong arguemtn type.
>
> I failed to have a query that this patch works on. Could you let me
> have some specific example for this patch?
>
> This patch needs more comments. Please put comment about not only what
> it does but also the reason and other things for it.
>
>
> -- about namings
>
> Names for functions and variables are needed to be more appropriate,
> in other words, needed to be what properly informs what they are. The
> followings are the examples of such names.
>
> "added_restrictlist"'s widely distributed as many function arguemnts
> and JoinPathExtraData makes me feel dizzy.. create_mergejoin_path
> takes it as "filtering_clauses", which looks far better.
>
> try_join_pushdown() is also the name with much wider meaning. This
> patch tries to move hashjoins on inheritance parent to under append
> paths. It could be generically called 'pushdown'
> but this would be better be called such like 'transform appended
> hashjoin' or 'hashjoin distribution'. The latter would be better.
> (The function name would be try_distribute_hashjoin for the
> case.)
>
> The name make_restrictinfos_from_check_contr() also tells me wrong
> thing. For example,
> extract_constraints_for_hashjoin_distribution() would inform me about
> what it returns.
>
>
> -- about what make_restrictinfos_from_check_constr() does
>
> In make_restrictinfos_from_check_constr, the function returns modified
> constraint predicates correspond to vars under hashjoinable join
> clauses. I don't think expression_tree_mutator is suitable to do that
> since it could allow unwanted result when constraint predicates or join clauses are not simple OpExpr's.
>
> Could you try more simple and straight-forward way to do that?
> Otherwise could you give me clear explanation on what it does?
>
> regards,
>
> --
> Kyotaro Horiguchi
> NTT Open Source Software Center
>
>
>

Attachment Content-Type Size
pushdown_test.v1.large.sql application/octet-stream 1.0 KB
pushdown_test.v1.sql application/octet-stream 1.9 KB
join_pushdown.v2.patch application/octet-stream 23.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2015-10-21 12:55:42 Re: a raft of parallelism-related bug fixes
Previous Message Artur Zakirov 2015-10-21 10:57:50 Re: [PROPOSAL] Improvements of Hunspell dictionaries support