Re: [Proposal] Table partition + join pushdown

From: Taiki Kondo <tai-kondo(at)yk(dot)jp(dot)nec(dot)com>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Akio Iwaasa <aki-iwaasa(at)vt(dot)jp(dot)nec(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [Proposal] Table partition + join pushdown
Date: 2015-10-08 08:28:04
Message-ID: 12A9442FBAE80D4E8953883E0B84E0885F9913@BPXM01GP.gisp.nec.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello, Horiguchi-san.

Thank you for your comment.

> I got some warning on compilation on unused variables and wrong
> arguemtn type.

OK, I'll fix it.

> I failed to have a query that this patch works on. Could you let
> me have some specific example for this patch?

Please find attached.
And also make sure that setting of work_mem is '64kB' (not 64MB).

If there is the large work_mem enough to create hash table for
relation after appending, its cost may be better than pushed-down
plan's cost, then planner will not choose pushed-down plan this patch makes.
So, to make this patch working fine, work_mem size must be smaller than
the hash table size for relation after appending.

> This patch needs more comments. Please put comment about not only
> what it does but also the reason and other things for it.

OK, I'll put more comments in the code.
But it will take a long time, maybe...

> -- about namings
>
> Names for functions and variables are needed to be more
> appropriate, in other words, needed to be what properly informs
> what they are. The followings are the examples of such names.

Thank you for your suggestion.

I also think these names are not good much.
I'll try to make names better , but it maybe take a long time...
Of course, I will use your suggestion as reference.

> "added_restrictlist"'s widely distributed as many function
> arguemnts and JoinPathExtraData makes me feel
> dizzy..

"added_restrictinfo" will be deleted from almost functions
other than try_join_pushdown() in next (v2) patch because
the place of filtering using this info will be changed
from Join node to Scan node and not have to place it
into other than try_join_pushdown().

> In make_restrictinfos_from_check_constr, the function returns
> modified constraint predicates correspond to vars under
> hashjoinable join clauses. I don't think expression_tree_mutator
> is suitable to do that since it could allow unwanted result when
> constraint predicates or join clauses are not simple OpExpr's.

Do you have any example of this situation?
I am trying to find unwanted results you mentioned, but I don't have
any results in this timing. I have a hunch that it will allow unwanted
results because I have thought only about very simple situation for
this function.

> Otherwise could you give me clear explanation on what it does?

This function transfers CHECK() constraint to filter expression by following
procedures.
(1) Get outer table's CHECK() constraint by using get_relation_constraints().
(2) Walk through expression tree got in (1) by using expression_tree_mutator()
with check_constraint_mutator() and change only outer's Var node to
inner's one according to join clause.

For example, when CHECK() constraint of table A is "num % 4 = 0" and
join clause between table A and B is "A.num = B.data",
then we can get "B.data % 4 = 0" for filtering purpose.

This also accepts more complex join clause like "A.num = B.data * 2",
then we can get "(B.data * 2) % 4 = 0".

In procedure (2), to decide whether to use each join clause for changing
Var node or not, I implement check_constraint_mutator() to judge whether
join clause is hash-joinable or not.

Actually, I want to judge whether OpExpr as top expression tree of
join clause means "=" or not, but I can't find how to do it.

If you know how to do it, please let me know.

Best regards,
--
Taiki Kondo

NEC Solution Innovators, Ltd.

-----Original Message-----
From: Kyotaro HORIGUCHI [mailto:horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp]
Sent: Tuesday, October 06, 2015 8:35 PM
To: tai-kondo(at)yk(dot)jp(dot)nec(dot)com
Cc: kaigai(at)ak(dot)jp(dot)nec(dot)com; aki-iwaasa(at)vt(dot)jp(dot)nec(dot)com; pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] [Proposal] Table partition + join pushdown

Hello.

I tried to read this and had some random comments on this.

-- general

I got some warning on compilation on unused variables and wrong arguemtn type.

I failed to have a query that this patch works on. Could you let me have some specific example for this patch?

This patch needs more comments. Please put comment about not only what it does but also the reason and other things for it.

-- about namings

Names for functions and variables are needed to be more appropriate, in other words, needed to be what properly informs what they are. The followings are the examples of such names.

"added_restrictlist"'s widely distributed as many function arguemnts and JoinPathExtraData makes me feel dizzy.. create_mergejoin_path takes it as "filtering_clauses", which looks far better.

try_join_pushdown() is also the name with much wider meaning. This patch tries to move hashjoins on inheritance parent to under append paths. It could be generically called 'pushdown'
but this would be better be called such like 'transform appended hashjoin' or 'hashjoin distribution'. The latter would be better.
(The function name would be try_distribute_hashjoin for the
case.)

The name make_restrictinfos_from_check_contr() also tells me wrong thing. For example,
extract_constraints_for_hashjoin_distribution() would inform me about what it returns.

-- about what make_restrictinfos_from_check_constr() does

In make_restrictinfos_from_check_constr, the function returns modified constraint predicates correspond to vars under hashjoinable join clauses. I don't think expression_tree_mutator is suitable to do that since it could allow unwanted result when constraint predicates or join clauses are not simple OpExpr's.

Could you try more simple and straight-forward way to do that?
Otherwise could you give me clear explanation on what it does?

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
pushdown_test.v1.sql application/octet-stream 1.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amir Rohan 2015-10-08 09:03:57 Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
Previous Message Michael Paquier 2015-10-08 07:39:29 Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.