Re: from_collapse_limit vs. geqo_threshold

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: from_collapse_limit vs. geqo_threshold
Date: 2009-05-25 22:15:06
Message-ID: 9134.1243289706@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Thu, May 21, 2009 at 7:50 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Josh Berkus <josh(at)agliodbs(dot)com> writes:
>>> ... trying to remember why I wrote that ... what would happen if
>>> FROM_COLLAPSE_LIMIT was *more* than GEQO_THRESHOLD?
>>
>> I think I wrote it, not you. The point of the advice is to keep
>> subquery collapsation (hm, what's the right noun form? Need caffeine)
>> from turning a non-GEQO query into a GEQO one, and thus subjecting
>> you to unpredictable plans. Maybe the resulting plans would be better
>> on average, or maybe they wouldn't, but in any case they'd be
>> unpredictable.

> That's more or less what I figured, but my real world experience is
> that pulling up subqueries and using GEQO leads to plans that are
> random but tolerable, whereas not pulling up subqueries leads to plans
> that are almost uniformly bad.

I went back and looked at the CVS history to try to refresh my memory
about how we got here. As best I can find, there were two steps:

1. The original commit of the ability to have subqueries at all,
during 7.1 development:

2000-09-29 14:21 tgl

Subselects in FROM clause, per
ISO syntax: FROM (SELECT ...) [AS] alias. (Don't forget that an
alias is required.) Views reimplemented as expanding to
subselect-in-FROM. Grouping, aggregates, DISTINCT in views
actually work now (he says optimistically). No UNION support in
subselects/views yet, but I have some ideas about that.
Rule-related permissions checking moved out of rewriter and into
executor. INITDB REQUIRED!

This introduced the ability to pull up subqueries, but with an arbitrary
limit of geqo_threshold/2 on the number of relations that would be
collected into a single planning problem.

2. During 7.4 development, we did this:

2003-01-25 18:10 tgl

Allow the planner to collapse explicit inner JOINs together, rather
than necessarily following the JOIN syntax to develop the query
plan. The old behavior is still available by setting GUC variable
JOIN_COLLAPSE_LIMIT to 1. Also create a GUC variable
FROM_COLLAPSE_LIMIT to control the similar decision about when to
collapse sub-SELECT lists into their parent lists. (This behavior
existed already, but the limit was always GEQO_THRESHOLD/2; now
it's separately adjustable.)

The excuse for join_collapse_limit to exist at all is largely one of
backwards compatibility. Up to then, we had not-infrequently suggested
that people could force a desired join order by writing an explicit JOIN
nest, and eliminating that escape hatch altogether didn't seem like a
good idea. I think from_collapse_limit was added largely on grounds of
symmetry.

Now, as to why the original commit had the geqo_threshold/2 restriction:
it was obviously not based on field experience with flattening, because
we didn't have any. What I think it *was* based on was that GEQO sucked
really badly back then, and I wanted to avoid having it kick in for
queries that it had never kicked in for in previous releases. Some
quick comparisons say that 7.1 in GEQO mode was about 5X slower than
HEAD (despite its planning being a lot more simplistic), and tended to
find considerably worse plans. Some of the significant improvements
since then:

2004-01-23 18:54 tgl

Revise GEQO planner to make use of some heuristic knowledge about
SQL, namely that it's good to join where there are join clauses
rather than where there are not. Also enable it to generate bushy
plans at need, so that it doesn't fail in the presence of multiple
IN clauses containing sub-joins.

2004-01-21 18:33 tgl

Repair error apparently introduced in the initial
coding of GUC: the default value for geqo_effort is supposed to be
40, not 1. The actual 'genetic' component of the GEQO algorithm
has been practically disabled since 7.1 because of this mistake.

Also, up to 7.0 there were some nasty memory leaks in the planner and
especially in GEQO, because we didn't have the memory context mechanism.
I think those were actually fixed as of 2000-09-29, but GEQO still had a
reputation for blowing out backend memory.

Now I'm still not exactly happy with GEQO, but it's surely a lot better
than it was in the fall of 2000. So on the whole it does seem that the
current relationships between from_collapse_limit, join_collapse_limit,
and geqo_threshold are based on obsolete information and should be
revisited. I don't have any data at hand to suggest specific new
default values, though.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2009-05-25 22:32:57 Re: generic options for explain
Previous Message Robert Haas 2009-05-25 22:12:13 Re: generic options for explain