Re: Re: [COMMITTERS] pgsql: Rewrite GEQO's gimme_tree function so that it always finds a

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org, Greg Sabino Mullane <greg(at)turnstep(dot)com>
Subject: Re: Re: [COMMITTERS] pgsql: Rewrite GEQO's gimme_tree function so that it always finds a
Date: 2009-11-27 23:23:21
Message-ID: 603c8f070911271523n4be4d1e3m9df602a7716d9608@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

On Fri, Nov 27, 2009 at 3:05 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Mon, Nov 9, 2009 at 1:42 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Mon, Nov 9, 2009 at 1:10 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>>>> On Mon, Nov 9, 2009 at 10:57 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>>>> Too bad you don't have debug symbols ... it'd be interesting to see
>>>>> how long that list is.
>>>
>>>> I stopped it a couple of times.  Lengths of list1, list2 respectively:
>>>
>>>> 8876, 20
>>>> 14649, 18
>>>> 15334, 10
>>>> 17148, 18
>>>> 18173, 18
>>>
>>> Yowza.  18000 distinct paths for one relation?  Could we see the test
>>> case?
>>
>> Well, the test case isn't simple, and I'm not sure that my employer
>> would be pleased if I posted it to a public mailing list. The general
>> thrust of it is that [...]
>
> Test case attached.  Load this into an empty database and then:
>
> set geqo to off;
> set join_collapse_limit to 100;
> set from_collapse_limit to 100;
> select * from foo_view order by name;
>
> I guess it's somewhat unsurprising that you can make the planner get
> into trouble with the above settings - we've been over that ground
> before.  But it might be possibly interesting that such a simple
> schema is capable of generating so many paths.  This breaks 40,000
> without finishing.

Hey, wait a minute. Unless I'm missing something, the items being
accumulated here are not paths (as Tom said upthread and I naively
believed), but RelOptInfos. It looks like we create a RelOptInfo for
every legal join order, so this is going to happen on any large-ish
query where the joins can be reordered relatively freely.

I don't think there's any easy fix for this. When the joins can be
reordered relatively freely, the number of legal join orders is
roughly n!. It might be possible to reduce the overhead associated
with a single one of those join orderings (which consists of a
RelOptInfo, some supporting data structures, and however many paths)
but given the explosive growth of n! that won't buy very much at all,
and it won't make the lists shorter in any case.

Playing around with this a bit, I was easily able to get 2-second
planing times on 15 table join, 6-second planning times on a 16 table
join and 30-second planning times on a 17 table join. This makes it
hard to support raising the GEQO threshold, as most recently suggested
by Greg Sabino Mullane here (and previously by me on an earlier
thread):

http://archives.postgresql.org/pgsql-hackers/2009-11/msg01106.php

What sucks about the current geqo_threshold mechanism is that the
number of tables is a fairly coarse predictor of the number of legal
join orders. On some queries it is close to n! - on others it is far
less. It would be nice if we had a way to either (1) approximate the
number of legal join orders before we start planning the query, and
base the GEQO decision on that number rather than on the number of
tables, or (2) always start with the standard (exhaustive) planner,
and switch to some kind of approximation algorithm (GEQO or otherwise)
if we find that to be impractical. But neither of these seems at all
straightforward.

...Robert

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message marcin mank 2009-11-28 00:33:21 Re: Re: [COMMITTERS] pgsql: Rewrite GEQO's gimme_tree function so that it always finds a
Previous Message Tom Lane 2009-11-27 23:04:58 Re: Re: [COMMITTERS] pgsql: Rewrite GEQO's gimme_tree function so that it always finds a

Browse pgsql-hackers by date

  From Date Subject
Next Message marcin mank 2009-11-28 00:33:21 Re: Re: [COMMITTERS] pgsql: Rewrite GEQO's gimme_tree function so that it always finds a
Previous Message Jeff Davis 2009-11-27 23:12:16 Re: New VACUUM FULL