a path towards replacing GEQO with something better

From: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: a path towards replacing GEQO with something better
Date: 2021-06-10 01:21:10
Message-ID: CAFBsxsE3Sb889r-Xvun+iAa5Onjy71m8nzp6JSH387JJF-YmrA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On occasion it comes up that the genetic query optimizer (GEQO) doesn't
produce particularly great plans, and is slow ([1] for example). The only
alternative that has gotten as far as a prototype patch (as far as I know)
is simulated annealing some years ago, which didn't seem to get far.

The join problem as it pertains to Postgres has been described within the
community in
[Gustaffson, 2017] and [Stroffek & Kovarik, 2007].

The fact that there is much more interest than code in this area indicates
that this is a hard problem. I hadn't given it much thought myself until by
chance I came across [Neumann, 2018], which describes a number of
interesting ideas. The key takeaway is that you want a graceful transition
between exhaustive search and heuristic search. In other words, if the
space of possible join orderings is just slightly larger than the maximum
allowed exhaustive search, then the search should be *almost*
exhaustive. This not only increases the chances of finding a good plan, but
also has three engineering advantages I can think of:

1) It's natural to re-use data structures etc. already used by the existing
dynamic programming (DP) algorithm. Framing the problem as extending DP
greatly lowers the barrier to making realistic progress. If the problem is
framed as "find an algorithm as a complete drop-in replacement for GEQO",
it's a riskier project in my view.

2) We can still keep GEQO around (with some huge limit by default) for a
few years as an escape hatch, while we refine the replacement. If there is
some bug that prevents finding a plan, we can emit a WARNING and fall back
to GEQO. Or if a user encounters a regression in a big query, they can
lower the limit to restore the plan they had in an earlier version.

3) It actually improves the existing exhaustive search, because the
complexity of the join order problem depends on the query shape: a "chain"
shape (linear) vs. a "star" shape (as in star schema), for the most common
examples. The size of the DP table grows like this (for n >= 4):

Chain: (n^3 - n) / 6 (including bushy plans)
Star: (n - 1) * 2^(n - 2)

n chain star
--------------------
4 10 12
5 20 32
6 35 80
7 56 192
8 84 448
9 120 1024
10 165 2304
11 220 5120
12 286 11264
13 364 24576
14 455 53248
15 560 114688
...
64 43680 290536219160925437952

The math behind this is described in detail in [Ono & Lohman, 1990]. I
verified this in Postgres by instrumenting the planner to count how many
times it calls make_join_rel().

Imagine having a "join enumeration budget" that, if exceeded, prevents
advancing to the next join level. Given the above numbers and a query with
some combination of chain and star shapes, a budget of 400 can guarantee an
exhaustive search when there are up to 8 relations. For a pure chain join,
we can do an exhaustive search on up to 13 relations, for a similar cost of
time and space. Out of curiosity I tested HEAD with a chain query having 64
tables found in the SQLite tests [2] and found exhaustive search to take
only twice as long as GEQO. If we have some 30-way join, and some (> 400)
budget, it's actually possible that we will complete the exhaustive search
and get the optimal plan. This is a bottom-up way of determining the
complexity. Rather than configuring a number-of-relations threshold
and possibly have exponential behavior blow up in their faces, users can
configure something that somewhat resembles the runtime cost.

Now, I'll walk through one way that a greedy heuristic can integrate with
DP. In our 30-way join example, if we use up our budget and don't have a
valid plan, we'll break out of DP at the last join level we completed.
Since we already have built a number of joinrels, we build upon that work
as we proceed. The approach I have in mind is described in [Kossmann &
Stocker, 2000], which the authors call "iterative dynamic programming"
(IDP). I'll describe one of the possible variants here. Let's say we only
got as far as join level 8, so we've created up to 8-way joinrels. We pick
the best few (maybe 5%) of these 8-way joinrels by some measure (doesn't
have to be the full cost model) and on top of each of them create a full
plan quickly: At each join level, we only pick one base relation (again by
some measure) to create one new joinrel and then move to the next join
level. This is very fast, even with hundreds of relations.

Once we have one valid, complete plan, we can technically stop at any time
(Coding this much is also a good proof-of-concept). How much additional
effort we expend to find a good plan could be another budget we have. With
a complete plan obtained quickly, we also have an upper bound on the
measure of the cheapest overall plan, so with that we can prune any more
expensive plans as we iterate through the 8-way joinrels. Once we have a
set of complete plans, we pick some of them to improve the part of the plan
picked during the greedy step. For some size k (typically between 4 and 7),
we divide the greedy-step part of the join into k-sized sections. So with
our 30-table join where we started with an 8-way joinrel, we have 22
tables. If k=5, we run standard dynamic programming (including the standard
cost model) on four 5-table sections and then once the last 2-table section.

You can also think of it like this: We quickly find 5 tables that likely
would be good to join next, find the optimal join order among the 5, then
add that to our joinrel. We keep doing that until we get a valid plan. The
only difference is, performing the greedy step to completion allows us to
prune subsequent bad intermediate steps.

By "some measure" above I'm being a bit hand-wavy, but at least in the
literature I've read, fast heuristic algorithms seem to use simpler and
cheaper-to-compute metrics like intermediate result size or selectivity,
rather than a full cost function. That's a detail to be worked out. Also,
it must be said that in choosing among intermediate steps we need to be
careful to include things like:

- interesting sort orders
- patition-wise joins
- parameterized paths

Further along the lines of extending DP that's kind of orthogonal to the
above is the possibility of doing pruning during the initial DP step.
Looking again at how quickly the join enumeration for star queries grows
with increasing "n", it makes sense that a large number of those are bad
plans. In [Das & Haritsa, 2006], the authors demonstrate a method of
extending the reach of DP by pruning joinrels at each join level by two
criteria:

1) Whether the joinrel contains a hub relation (i.e. is the center of a
star)
2) A skyline function taking into account cost, cardinality, and selectivity

This way, the worst joinrels of star queries are pruned and the initial
join budget I mentioned above goes a lot farther.

There are quite a few details and variations I left out (testing, for one),
but this is enough to show the idea. I plan on working on this during the
PG15 cycle. I'd appreciate any feedback on the above.
--

[1] https://www.postgresql.org/message-id/15658.1241278636@sss.pgh.pa.us

[Stroffek & Kovarik, 2007]
https://www.pgcon.org/2007/schedule/attachments/28-Execution_Plan_Optimization_Techniques_Stroffek_Kovarik.pdf

[Gustaffson, 2017]
https://www.postgresql.eu/events/pgconfeu2017/sessions/session/1586/slides/26/Through_the_Joining_Glass-PGConfeu-DanielGustafsson.pdf

[Neumann, 2018] Adaptive Optimization of Very Large Join Queries.
https://dl.acm.org/doi/10.1145/3183713.3183733

[Ono & Lohman, 1990] Measuring the Complexity of Join Enumeration in Query
Optimization.
https://www.csd.uoc.gr/~hy460/pdf/MeasuringtheComplexityofJoinEnumerationinQueryOptimization.PDF

[2] https://www.sqlite.org/sqllogictest/file?name=test/select5.test
(Note: there are no explicit join clauses so "from" limits didn't have an
effect in my quick test.)

[Kossmann & Stocker, 2000] Iterative dynamic programming: a new class of
query optimization algorithms. https://doi.org/10.1145/352958.352982

[Das & Haritsa, 2006] Robust Heuristics for Scalable Optimization of
Complex SQL Queries.
https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.549.4331&rep=rep1&type=pdf

--
John Naylor
EDB: http://www.enterprisedb.com

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message houzj.fnst@fujitsu.com 2021-06-10 01:26:38 RE: Parallel INSERT SELECT take 2
Previous Message Kyotaro Horiguchi 2021-06-10 01:12:40 Re: Race condition in recovery?