Re: *_collapse_limit, geqo_threshold

From: Noah Misch <noah(at)leadboat(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: *_collapse_limit, geqo_threshold
Date: 2009-07-09 03:38:38
Message-ID: 20090709033838.GA27439@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jul 08, 2009 at 05:23:16PM -0400, Tom Lane wrote:
> Noah Misch <noah(at)leadboat(dot)com> writes:
> > The expontential factor seems smaller for real queries. I have a query of
> > sixteen joins that takes 71s to plan deterministically; it looks like this:
>
> > SELECT 1 FROM fact JOIN dim0 ... JOIN dim6
> > JOIN t t0 ON fact.key = t.key AND t.x = MCV0
> > LEFT JOIN t t1 ON fact.key = t.key AND t.x = MCV1
> > JOIN t t2 ON fact.key = t.key AND t.x = MCV2
> > LEFT JOIN t t3 ON fact.key = t.key AND t.x = NON-MCV0
> > LEFT JOIN t t4 ON fact.key = t.key AND t.x = NON-MCV1
> > LEFT JOIN t t5 ON fact.key = t.key AND t.x = NON-MCV2
> > LEFT JOIN t t6 ON fact.key = t.key AND t.x = NON-MCV3
> > LEFT JOIN t t7 ON fact.key = t.key AND t.x = NON-MCV4
>
> I'm confused here --- I think you must have over-anonymized your query.
> Surely the ON conditions for the left joins should be referencing t3,
> t4, etc?

Yes. Aside from that error, I picked the wrong features to emphasize and gloss
over. Only two relations (amidst `dim0 ... dim6') resemble dimensions, having
an implied relative position on the plan tree. The other fourteen relations
share a join key. That explains the distinctive plan cost; this query resembles
the utterly unconstrained artificial query to a larger degree than most.

> > For the real query, removing one join drops plan time to 26s, and
> > removing two drops the time to 11s. I don't have a good theory for
> > the multiplier changing from 4 for the trivial demonstration to ~2.5
> > for this real query.
>
> The rule of thumb that says that an n-way join requires 2^n work is only
> true if we consider every single combination of possible joins, which
> normally we don't. The planner prefers join paths that follow join
> clauses, and will only consider clauseless joins when it has no other
> choice. I believe that real queries tend to be pretty non-flat in this
> space and so the number of join paths to consider is a lot less than 2^n.
> Your synthesized query, on the other hand, allows any relation to be
> joined to any other --- it might not look that way, but after creating
> derived equalities there will be a potential join clause linking every
> relation to every other one. So I think you were testing the worst case,
> and I'm not surprised that more-typical queries would show a slower
> growth curve.

Describing in those terms illuminates much. While the concepts do suggest 2^N
worst-case planning cost, my artificial test case showed a rigid 4^N pattern;
what could explain that?

Thanks,
nm

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2009-07-09 04:06:29 Re: *_collapse_limit, geqo_threshold
Previous Message Tom Lane 2009-07-09 02:43:07 Re: *_collapse_limit, geqo_threshold