Re: a path towards replacing GEQO with something better

From: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: a path towards replacing GEQO with something better
Date: 2021-06-14 11:16:58
Message-ID: CAFBsxsGCEZiqZZiBN1mMD26y4xUyqEkV00_6Eth8GEnBnko1_w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Jun 13, 2021 at 9:50 AM Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
wrote:

> > 2) We can still keep GEQO around (with some huge limit by default) for a
> > few years as an escape hatch, while we refine the replacement. If there
> > is some bug that prevents finding a plan, we can emit a WARNING and fall
> > back to GEQO. Or if a user encounters a regression in a big query, they
> > can lower the limit to restore the plan they had in an earlier version.
> >
>
> Not sure. Keeping significant amounts of code may not be free - both for
> maintenance and new features. It'd be a bit sad if someone proposed
> improvements to join planning, but had to do 2x the work to support it
> in both the DP and GEQO branches, or risk incompatibility.

Looking back again at the commit history, we did modify geqo to support
partial paths and partition-wise join, so that's a fair concern. My concern
is the risk of plan regressions after an upgrade, even if for a small
number of cases.

> OTOH maybe this concern is unfounded in practice - I don't think we've
> done very many big changes to geqo in the last few years.

Yeah, I get the feeling that it's already de facto obsolete, and we could
make it a policy not to consider improvements aside from bug fixes where it
can't find a valid plan, or forced API changes. Which I guess is another
way of saying "deprecated".

(I briefly considered turning it into a contrib module, but that seems like
the worst of both worlds.)

> This reminds me the proposals to have a GUC that'd determine how much
> effort should the planner invest into various optimizations. For OLTP it
> might be quite low, for large OLAP queries it'd be economical to spend
> more time trying some more expensive optimizations.
>
> The challenge of course is how / in what units / to define the budget,
> so that it's meaningful and understandable for users. Not sure if
> "number of join rels generated" will be clear enough for users. But it
> seems good enough for PoC / development, and hopefully people won't have
> to tweak it very often.

I'm also in favor of having some type of "planner effort" or "OLTP to OLAP
spectrum" guc, but I'm not yet sure whether it'd be better to have it
separate or to couple the joinrel budget to it. If we go that route, I
imagine there'll be many things that planner_effort changes that we don't
want to give a separate knob for. And, I hope with graceful degradation and
a good enough heuristic search, it won't be necessary to change in most
cases.

> I haven't read the [Kossmann & Stocker, 2000] paper yet, but the
> [Neumann, 2018] paper seems to build on it, and it seems to work with
> much larger subtrees of the join tree than k=5.

Right, in particular it builds on "IDP-2" from Kossmann & Stocker. Okay, so
Neumann's favorite algorithm stack "Adaptive" is complex, and I believe you
are referring to cases where they can iteratively improve up to 100 rels at
a time because of linearization. That's a separate algorithm (IKKBZ) that
complicates the cost model and also cannot have outer joins. If it has
outer joins, they use regular DP on subsets of size up to 10. It's not
substantively different from IDP-2, and that's the one I'd like to try to
gracefully fall back to. Or something similar.

> What I find fairly interesting is the section in [Neumann, 2018] about
> cardinality estimates, and quality of query plans when the estimates are
> off. The last paragraph in the 6.5 section essentially says that despite
> poor estimates, the proposed algorithm performs better than the simple
> (and cheap) heuristics. I'm not sure what to think about that, because
> my "intuitive" understanding is that the more elaborate the planning is,
> the more errors it can make when the estimates are off.

Yeah, I'm not sure this part is very useful and seems almost like an
afterthought. In table 3, all those poor examples are "pure" greedy
algorithms and don't have iterative refinement added, so it kind of makes
sense that poor estimates would hurt them more. But they don't compare
those with *just* a refinement step added. I also don't know how realistic
their "estimate fuzzing" is.

--
John Naylor
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Sergey Fedchenko 2021-06-14 11:17:24 Re: BUG #15293: Stored Procedure Triggered by Logical Replication is Unable to use Notification Events
Previous Message Matthias van de Meent 2021-06-14 09:53:47 Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic