Re: Improving our clauseless-join heuristics

From: Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
To: 'Tom Lane' <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Improving our clauseless-join heuristics
Date: 2012-04-16 07:00:58
Message-ID: 004e01cd1b9e$adca9550$095fbff0$%kapila@huawei.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>>Another way to look at this is that if we have

>> select ... from a,b,c,d where a.x = b.y + c.z

>>we want to consider a cross-join of b and c, in the hopes that we can do
>>something useful with the join clause at the next level where it can
>>join to a. From b's perspective there is no percentage in joining to d.

For this kind of query, currently (referring 9.0.3 code) also it considers
join of b,c and b,d.
As there is no join clause between b,c,d so it will go in path of
make_rels_by_clauseless_joins() where it considers join of b,c and b,d.

In this kind of query, if the suggestion by me in below mail is followed,
then it will consider joining a,b a,c a,d at level-2 in function
make_rels_by_clause_joins() which it currently doesn't do which may generate
useless join paths.
However in real-world scenario's this kind of queries where 2 cols of
different tables are
used in one side expression (b.y + c.z) of where clause may be less.

>>On the other hand, when we come to consider d, it will have no join
>>clauses so we will consider joining it to each other rel in turn.

When it come to consider d, as at level -2 it only consider later rels. So
it should not consider joining with each other rel.

-----Original Message-----
From: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
Sent: Monday, April 16, 2012 9:57 AM
To: Amit Kapila
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] Improving our clauseless-join heuristics

Amit Kapila <amit(dot)kapila(at)huawei(dot)com> writes:
> That case is handled by make_rels_by_clauseless_joins
> It will be handled by make_rels_by_clauseless_joins() if given rel old_rel
> doesn't have any join clause.
> However if it has join clause but doesn't able to join with any other rels
> like in the example you have provided for relation c, it is not able to
join
> with other rel d.
> In such cases it can do cross-join with d, because it has not found any
> relation to join with.
> Doesn't it will address the problem you mentioned?

Sounds to me like that's going in the wrong direction, ie, joining to
exactly the wrong relations. If you have to cross-join it's better to
cross-join against relations that are included in one of the join
clauses that does mention the current relation.

Another way to look at this is that if we have

select ... from a,b,c,d where a.x = b.y + c.z

we want to consider a cross-join of b and c, in the hopes that we can do
something useful with the join clause at the next level where it can
join to a. From b's perspective there is no percentage in joining to d.
On the other hand, when we come to consider d, it will have no join
clauses so we will consider joining it to each other rel in turn. So
if there is any value in joining d early, we will find that out.

Or at least that's the theory. Now that I look at it this way, I think
there is a bug here: when we are at level 2, we only consider later rels
in the list for forced clauseless joins. That would be okay if the
condition were symmetrical, but it isn't. This makes for a bogus
FROM-list ordering dependency in handling of clauseless joins. Not
sure how much that matters in the real world, but it still seems wrong.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2012-04-16 07:02:56 Re: index-only scans vs. Hot Standby, round two
Previous Message Nikhil Sontakke 2012-04-16 06:56:06 Re: how to create a non-inherited CHECK constraint in CREATE TABLE