Re: Improving our clauseless-join heuristics

From: Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
To: 'Tom Lane' <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Improving our clauseless-join heuristics
Date: 2012-04-17 02:03:47
Message-ID: 000c01cd1c3e$54312770$fc937650$%kapila@huawei.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>>I might still be misunderstanding, but I think what you are suggesting
>>is that in the loop in make_rels_by_clause_joins, if we find that the
>>old_rel doesn't have a join clause/restriction with the current
>>other_rel, we check to see whether other_rel has any join clauses at
>>all, and force the join to occur anyway if it doesn't.

It is on similar lines, but the only difference is that it will try to join
old_rel with other_rel list incase
old_rel is not able to join with any of other_rel in the list with proper
join clause between them.

>>In the second place, instead of N tests to see whether a rel lacks any
join clauses,
>>we'd now have something approaching O(N^2) such tests, in the typical
>>case where most base rels are directly joined to only a few other rels.

In the typical case where most base rels are directly joined to only a few
other rels, it will not go in the path where it has to try new combinations,
as the idea is that if in first pass the old_rel is not able find any
suitable rel to join then only it will go in second pass to try to join with
other_rel list.

>>In the first place, it seems to miss the need to
>>clauseless-join two relations when neither of them have any join
>>clauses, for instance plain old "SELECT * FROM a, b". So you still need
>>something like the make_rels_by_clauseless_joins code path, and it's
>>not entirely clear how to avoid duplicated work there.

I believe this will not be related to clause-less joins because we have
entered the function make_rels_by_clause_joins() for relation which has join
clause and which can be joined to one of the other_rel. So in most cases I
believe it will find one of the members in other_rel list with a proper join
clause. Only in some cases when it is not able to find any of the other_rel
with proper join clause between them, it will go in new path. So in a way if
we see this case handling is different from handling clause-less join case.

The other way like you said "is there any join clause that both these
relations participate in?"

Does this mean that we will evaluate the join list to see if there can be
any valid join between 2 relations.
Is it possible that in all scenarios or cases we will be able to find the
existence of any valid join which
can be possible may be not at this level but at next level.

-----Original Message-----
From: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
Sent: Tuesday, April 17, 2012 2:05 AM
To: Amit Kapila
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] Improving our clauseless-join heuristics

Amit Kapila <amit(dot)kapila(at)huawei(dot)com> writes:
> For this kind of query, currently (referring 9.0.3 code) also it considers
> join of b,c and b,d.
> As there is no join clause between b,c,d so it will go in path of
> make_rels_by_clauseless_joins() where it considers join of b,c and b,d.

> In this kind of query, if the suggestion by me in below mail is followed,
> then it will consider joining a,b a,c a,d at level-2 in function
> make_rels_by_clause_joins() which it currently doesn't do which may
generate
> useless join paths.
> However in real-world scenario's this kind of queries where 2 cols of
> different tables are
> used in one side expression (b.y + c.z) of where clause may be less.

> On the other hand, when we come to consider d, it will have no join
> clauses so we will consider joining it to each other rel in turn.

> When it come to consider d, as at level -2 it only consider later rels. So
> it should not consider joining with each other rel.

I might still be misunderstanding, but I think what you are suggesting
is that in the loop in make_rels_by_clause_joins, if we find that the
old_rel doesn't have a join clause/restriction with the current
other_rel, we check to see whether other_rel has any join clauses at
all, and force the join to occur anyway if it doesn't.

I can't really get excited about doing it that way instead of the
current way. In the first place, it seems to miss the need to
clauseless-join two relations when neither of them have any join
clauses, for instance plain old "SELECT * FROM a, b". So you still need
something like the make_rels_by_clauseless_joins code path, and it's
not entirely clear how to avoid duplicated work there. In the second
place, instead of N tests to see whether a rel lacks any join clauses,
we'd now have something approaching O(N^2) such tests, in the typical
case where most base rels are directly joined to only a few other rels.
So it seems to make things slower for little obvious benefit.

In general, queries with join-clause-less rels are pretty uncommon,
so I don't want to introduce extra work into make_rels_by_clause_joins
to handle the case.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro HORIGUCHI 2012-04-17 02:09:14 Re: A typo fix in a comment in xlog.c
Previous Message Michael Nolan 2012-04-16 23:54:30 Re: Slow temporary tables when using sync rep