Re: from_collapse_limit vs. geqo_threshold

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: from_collapse_limit vs. geqo_threshold
Date: 2009-05-26 01:00:21
Message-ID: 603c8f070905251800g5b86d2dav26eca7f417d15dbf@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, May 25, 2009 at 6:15 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Now I'm still not exactly happy with GEQO, but it's surely a lot better
> than it was in the fall of 2000.  So on the whole it does seem that the
> current relationships between from_collapse_limit, join_collapse_limit,
> and geqo_threshold are based on obsolete information and should be
> revisited.  I don't have any data at hand to suggest specific new
> default values, though.

For 8.4, I'd be happy to just improve the documentation. I think this
sentence could just be deleted from the section on
from_collapse_limit:

It is usually wise to keep this less than <xref linkend="guc-geqo-threshold">.

We could put some other explanation in place of that sentence, but I'm
not exactly sure what that explanation would say. I guess the point
is that setting from_collapse_limit < geqo_threshold may delay GEQO
planning considerably in the face of complex subqueries, because
pulling up subqueries increases the size of the FROM list (I think).
That could be good if you want your query plans to be more
deterministic, but there's no guarantee they'll be good. Setting
from_collapse_limit > geqo_threshold is basically saying that the
standard planner will always have subqueries pulled up, so
from_collapse_limit should be based on what the pain point will be for
GEQO.

I'm not sure there's a lot of point in spelling all that out, though.
It more or less follows from the definition of the parameters. So,
I'd be just as happy to delete the misleading hint and call it good.
But I could go either way.

For 8.5, it sounds like we need to do some testing to determine an
appropriate set of values, but I'm not exactly sure what to test. As
a practical matter, the correct level of effort depends a lot on how
long the query figures to run. For OLAP queries, planning times of
more than 50 ms or so start to add noticeably to the overall runtime
of the query, but if the query is expected to run for several minutes,
we'd presumably be happy to spend several seconds planning it, which
might make it feasible to use the standard planner even for very, very
big queries.

I'm not 100% convinced of the value of join_collapse_limit for
anything other than explicit control over the join order. I have yet
to meet a PostgreSQL who thought that it was intuitive that it might
matter whether you wrote A JOIN B ON P1 JOIN C ON P2 JOIN D ON P3
[etc] or A, B, C, D, [etc] WHERE P1, P2, P3. I suspect there are many
people who, if they knew that the latter might optimize better than
the former in some circumstances, would simply always write it in the
latter fashion, which makes the whole thing look a lot like a
concealed foot-gun, since whether or not it actually protects you
against exponential planning-time growth has a lot to do with how you
happen to like to write your queries (myself, I've switched styles in
the last few years).

...Robert

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2009-05-26 01:12:59 Re: generic options for explain
Previous Message Gevik Babakhani 2009-05-26 00:46:42 Re: usability of pg_get_function_arguments