From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Improving our clauseless-join heuristics |
Date: | 2012-04-13 14:46:16 |
Message-ID: | CA+TgmoYDinnsLBkkr+YbpsO1hurVGk5+C256o-XGX8VqH27mow@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Apr 13, 2012 at 10:32 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I looked into the behavior complained of here:
> http://archives.postgresql.org/pgsql-performance/2012-04/msg00093.php
>
> The problem query can be abstracted to
>
> select * from a, b, c, d
> where a.x = b.y and (a.z = c.c or a.z = d.d)
>
> Table a is much larger than the others (in fact, in the given example
> c and d are known to be single rows), and there are indexes on the
> mentioned columns of a. In this situation, the best plan is to
> cross-join c and d, then use a BitmapOr indexscan to pick out the rows
> of a that satisfy the OR condition, and finally join that small number
> of rows to b. The planner will use a cross-join-first plan if we omit
> b and the first WHERE clause from the query; but in the query as given,
> it fails to discover that plan and falls back on a vastly inferior plan
> that involves forming the a/b join first.
>
> The reason for this behavior is the anti-clauseless-join heuristics in
> join_search_one_level(). Without b, there are no join clauses available
> at join level 2, so the planner is forced to form all three 2-way cross
> joins; and then at level 3 it finds out that joining a to c/d works
> well. With b, we find the a/b join has a usable join clause so we form
> that join, and then we decide not to make any 2-way clauseless joins.
> So the c/d join is never constructed and there is no way to exploit the
> desirable indexscan at higher levels.
>
> After some reflection I think that the blame should be pinned on
> have_relevant_joinclause(), which is essentially defined as "is there
> any join clause that can be evaluated at the join of these two
> relations?". I think it would work better to define it as "is there any
> join clause that both these relations participate in?". In the majority
> of real-world queries, join clauses relate exactly two relations, so
> that these two definitions are equivalent. However, when we do have
> join clauses involving 3 or more relations, such as the OR clause in
> this example, it's evidently useful to consider cross-product joins of
> the smaller relations so that the join clause can be applied during the
> scan of the largest table.
>
> It would probably not be a good idea to back-patch such a change, since
> it might have consequences I can't foresee at the moment. But I'm
> strongly tempted to squeeze it into 9.2. Thoughts?
I think it's getting a little late in the day to be whacking the
planner around too much, but I have to admit that seems like a pretty
good and safe change to me, so maybe we should go ahead and do it.
I'm a bit worried, though, that with all the planner changes this
release we are going to spend a lot of time tracking down regressions
either in planning time or in plan quality.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2012-04-13 14:51:29 | Re: Improving our clauseless-join heuristics |
Previous Message | Robert Haas | 2012-04-13 14:40:34 | Re: Memory usage during sorting |