Re: *_collapse_limit, geqo_threshold

From: Noah Misch <noah(at)leadboat(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: *_collapse_limit, geqo_threshold
Date: 2009-07-09 12:23:28
Message-ID: 20090709122328.GB28380@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jul 08, 2009 at 10:28:02AM -0500, Robert Haas wrote:
> On Jul 8, 2009, at 8:43 AM, Noah Misch <noah(at)leadboat(dot)com> wrote:
>> The expontential factor seems smaller for real queries. I have a query of
>> sixteen joins that takes 71s to plan deterministically; it looks like this:
>>
>> SELECT 1 FROM fact JOIN dim0 ... JOIN dim6
>> JOIN t t0 ON fact.key = t.key AND t.x = MCV0
>> LEFT JOIN t t1 ON fact.key = t.key AND t.x = MCV1
>> JOIN t t2 ON fact.key = t.key AND t.x = MCV2
>> LEFT JOIN t t3 ON fact.key = t.key AND t.x = NON-MCV0
>> LEFT JOIN t t4 ON fact.key = t.key AND t.x = NON-MCV1
>> LEFT JOIN t t5 ON fact.key = t.key AND t.x = NON-MCV2
>> LEFT JOIN t t6 ON fact.key = t.key AND t.x = NON-MCV3
>> LEFT JOIN t t7 ON fact.key = t.key AND t.x = NON-MCV4
>
> Very interesting! I am guessing that the problem here is that all the
> inner joins commute, as do all the left joins, so there are a lot of
> possibilities to explore. I also suspect that the actual join order
> doesn't matter much, so it's a good candidate for GEQO. Even if you had
> some restriction clauses against your dimension/t tables, that would
> probably just mean that you want to do those joins first, and the rest
> afterwards, which still seems like it ought to be OK for GEQO.
>
> But, in many queries this size, the join order is more constrained.
> Some of the later joins depend on the tables added by the earlier ones,
> rather than all referring back to the base table. Is there some way we
> could estimate the number of join offerings we'll have to consider
> relatively cheaply? That might be a better metric than # of tables.

Observing the pathological case taking plan time proportional to 4^N, apply
geqo_threshold as "use GEQO when comparing more than geqo_threshold * 4^N join
order possibilities"? I'm not sure whether that figure is available (early
enough) to factor into the decision. Such a change would probably imply a lower
default geqo_threshold, around 9 to 11. The number of typical queries subject
to GEQO would nonetheless decrease.

nm

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Golub 2009-07-09 12:52:05 Re: bytea vs. pg_dump
Previous Message Itagaki Takahiro 2009-07-09 08:51:18 Re: multi-threaded pgbench