From: | Amit Kapila <amit(dot)kapila(at)huawei(dot)com> |
---|---|
To: | 'Tom Lane' <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgreSQL(dot)org |
Subject: | Re: Improving our clauseless-join heuristics |
Date: | 2012-04-16 03:26:52 |
Message-ID: | 003c01cd1b80$c4a8c0d0$4dfa4270$%kapila@huawei.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
I want to clarify small doubt in this regard.
In function make_rels_by_clause_joins(..), it tries to join the given
relation old_rel with other relations if there exist a join between them.
What I can understand is, it is because if there exists a join condition its
better to join with that relation.
However if the given relation old_rel is not able to join any relation, then
why can't it try to make cross-join with other relations there itself.
-----Original Message-----
From: pgsql-hackers-owner(at)postgresql(dot)org
[mailto:pgsql-hackers-owner(at)postgresql(dot)org] On Behalf Of Tom Lane
Sent: Friday, April 13, 2012 8:02 PM
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: [HACKERS] Improving our clauseless-join heuristics
I looked into the behavior complained of here:
http://archives.postgresql.org/pgsql-performance/2012-04/msg00093.php
The problem query can be abstracted to
select * from a, b, c, d
where a.x = b.y and (a.z = c.c or a.z = d.d)
Table a is much larger than the others (in fact, in the given example
c and d are known to be single rows), and there are indexes on the
mentioned columns of a. In this situation, the best plan is to
cross-join c and d, then use a BitmapOr indexscan to pick out the rows
of a that satisfy the OR condition, and finally join that small number
of rows to b. The planner will use a cross-join-first plan if we omit
b and the first WHERE clause from the query; but in the query as given,
it fails to discover that plan and falls back on a vastly inferior plan
that involves forming the a/b join first.
The reason for this behavior is the anti-clauseless-join heuristics in
join_search_one_level(). Without b, there are no join clauses available
at join level 2, so the planner is forced to form all three 2-way cross
joins; and then at level 3 it finds out that joining a to c/d works
well. With b, we find the a/b join has a usable join clause so we form
that join, and then we decide not to make any 2-way clauseless joins.
So the c/d join is never constructed and there is no way to exploit the
desirable indexscan at higher levels.
After some reflection I think that the blame should be pinned on
have_relevant_joinclause(), which is essentially defined as "is there
any join clause that can be evaluated at the join of these two
relations?". I think it would work better to define it as "is there any
join clause that both these relations participate in?". In the majority
of real-world queries, join clauses relate exactly two relations, so
that these two definitions are equivalent. However, when we do have
join clauses involving 3 or more relations, such as the OR clause in
this example, it's evidently useful to consider cross-product joins of
the smaller relations so that the join clause can be applied during the
scan of the largest table.
It would probably not be a good idea to back-patch such a change, since
it might have consequences I can't foresee at the moment. But I'm
strongly tempted to squeeze it into 9.2. Thoughts?
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2012-04-16 03:39:54 | Re: Improving our clauseless-join heuristics |
Previous Message | Michael Meskes | 2012-04-16 02:46:36 | Re: ECPG FETCH readahead |