Re: [Proposal] Table partition + join pushdown

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: tai-kondo(at)yk(dot)jp(dot)nec(dot)com
Cc: kaigai(at)ak(dot)jp(dot)nec(dot)com, aki-iwaasa(at)vt(dot)jp(dot)nec(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Proposal] Table partition + join pushdown
Date: 2015-10-08 10:04:10
Message-ID: 20151008.190410.219381863.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello, thank you for the example.

I could see this patch working with it.

> > In make_restrictinfos_from_check_constr, the function returns
> > modified constraint predicates correspond to vars under
> > hashjoinable join clauses. I don't think expression_tree_mutator
> > is suitable to do that since it could allow unwanted result when
> > constraint predicates or join clauses are not simple OpExpr's.
>
> Do you have any example of this situation?

As a rather simple case on the test environment made by the
provided script, the following query,

explain analyze
select data_x, data_y, num from check_test_div join inner_t on check_test_div.id + 1 = inner_t.id;

Makes the mutation fail then result in an assertion failure.

| TRAP: FailedAssertion("!(list_length(check_constr) == list_length(result))", File: "joinpath.c", Line: 1608)

This is because both 'check_test_div.id + 1' and inner_t.id don't
match the var-side of the constraints.

I don't see clearly what to do for the situation for now but this
is the one of the most important function for this feature and
should be cleanly designed.

At Thu, 8 Oct 2015 08:28:04 +0000, Taiki Kondo <tai-kondo(at)yk(dot)jp(dot)nec(dot)com> wrote in <12A9442FBAE80D4E8953883E0B84E0885F9913(at)BPXM01GP(dot)gisp(dot)nec(dot)co(dot)jp>
> Hello, Horiguchi-san.
>
> Thank you for your comment.
>
> > I got some warning on compilation on unused variables and wrong
> > arguemtn type.
>
> OK, I'll fix it.
>
> > I failed to have a query that this patch works on. Could you let
> > me have some specific example for this patch?
>
> Please find attached.
> And also make sure that setting of work_mem is '64kB' (not 64MB).
>
> If there is the large work_mem enough to create hash table for
> relation after appending, its cost may be better than pushed-down
> plan's cost, then planner will not choose pushed-down plan this patch makes.
> So, to make this patch working fine, work_mem size must be smaller than
> the hash table size for relation after appending.
>
> > This patch needs more comments. Please put comment about not only
> > what it does but also the reason and other things for it.
>
> OK, I'll put more comments in the code.
> But it will take a long time, maybe...

Sure.

> > -- about namings
> >
> > Names for functions and variables are needed to be more
> > appropriate, in other words, needed to be what properly informs
> > what they are. The followings are the examples of such names.
>
> Thank you for your suggestion.
>
> I also think these names are not good much.
> I'll try to make names better , but it maybe take a long time...
> Of course, I will use your suggestion as reference.

Thanks.

> > "added_restrictlist"'s widely distributed as many function
> > arguemnts and JoinPathExtraData makes me feel
> > dizzy..
>
> "added_restrictinfo" will be deleted from almost functions
> other than try_join_pushdown() in next (v2) patch because
> the place of filtering using this info will be changed
> from Join node to Scan node and not have to place it
> into other than try_join_pushdown().

I'm looking forward to see it.

> > In make_restrictinfos_from_check_constr, the function returns
> > modified constraint predicates correspond to vars under
> > hashjoinable join clauses. I don't think expression_tree_mutator
> > is suitable to do that since it could allow unwanted result when
> > constraint predicates or join clauses are not simple OpExpr's.
>
> Do you have any example of this situation?
> I am trying to find unwanted results you mentioned, but I don't have
> any results in this timing. I have a hunch that it will allow unwanted
> results because I have thought only about very simple situation for
> this function.

As mentioned above.

> > Otherwise could you give me clear explanation on what it does?
>
> This function transfers CHECK() constraint to filter expression by following
> procedures.
> (1) Get outer table's CHECK() constraint by using get_relation_constraints().
> (2) Walk through expression tree got in (1) by using expression_tree_mutator()
> with check_constraint_mutator() and change only outer's Var node to
> inner's one according to join clause.
>
> For example, when CHECK() constraint of table A is "num % 4 = 0" and
> join clause between table A and B is "A.num = B.data",
> then we can get "B.data % 4 = 0" for filtering purpose.
>
> This also accepts more complex join clause like "A.num = B.data * 2",
> then we can get "(B.data * 2) % 4 = 0".
>
> In procedure (2), to decide whether to use each join clause for changing
> Var node or not, I implement check_constraint_mutator() to judge whether
> join clause is hash-joinable or not.

Thanks for the explanation. I think that the function has been
made considering only rather plain calses. We should put more
thought to making the logic clearer so that we can define the
desired/possible capability and limitations clearly.

> Actually, I want to judge whether OpExpr as top expression tree of
> join clause means "=" or not, but I can't find how to do it.
>
> If you know how to do it, please let me know.

I don't see for now, too :p

But we at least should put more consideration on the mechanism to
obtain the expressions.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2015-10-08 10:11:07 proposal: PL/Pythonu - function ereport
Previous Message Fujii Masao 2015-10-08 10:03:26 Re: Freeze avoidance of very large table.