Re: GEQO vs join order restrictions

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: GEQO vs join order restrictions
Date: 2009-07-19 04:28:20
Message-ID: 603c8f070907182128r5e98aca9g44ba02a8103bb9f3@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Jul 18, 2009 at 12:49 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> We could refrain from collapsing the sub-problem during joinlist
> formation.  But the trouble with that is it creates a "hard" join order
> restriction.  Most of the restrictions are "soft" to some extent, ie,
> you can do some rearrangements but not others.  It might be worth
> looking at though; in the cases where there is no chance of a
> rearrangement, it would save cycles for either regular or GEQO planning.

This seems very promising, although the proof of the pudding is in the
eating. As a slight refinement of this idea, it's not exactly that
the join order is totally fixed, but rather that in a large join nest
there may be groups of tables that can be planned as independent
subproblems. The obvious example of this is something like:

(...lots of stuff...) FULL JOIN (...lots of things...)

...where there isn't any point in considering any joins between stuff
and things, regardless of whether you are using GEQO or the standard
planner (and maybe we handle this case already?). My brain is a
little foggy at the moment, but I think this would also apply to
Andres' example query, because I believe with anything of the form...

A LEFT JOIN (B1 INNER JOIN B2 INNER JOIN B3 ... INNER JOIN Bn) LEFT JOIN C

...the set of all Bi can be planned as an independent subproblem and
then joined to A either before or after C. But (I think):

A LEFT JOIN (B1 INNER JOIN B2 LEFT JOIN B3) LEFT JOIN C
= A LEFT JOIN (B1 INNER JOIN B2) LEFT JOIN C LEFT JOIN B3

Here {B1, B2} is an independent subproblem, but {B1, B2, B3} is not.
Still, given that the planning time for the standard planner at least
can grow exponentially in the number of relations, even a small
reduction in the problem size is potentially a big win.

...Robert

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2009-07-19 04:30:15 Re: Sampling profiler updated
Previous Message Tom Lane 2009-07-19 02:30:33 Re: Using results from INSERT ... RETURNING