Re: Add a greedy join search algorithm to handle large join problems

From: Tomas Vondra <tomas(at)vondra(dot)me>
To: Chengpeng Yan <chengpeng_yan(at)outlook(dot)com>
Cc: John Naylor <johncnaylorls(at)gmail(dot)com>, lakshmi <lakshmigcdac(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: Add a greedy join search algorithm to handle large join problems
Date: 2026-05-13 12:22:53
Message-ID: 840aa04d-2302-4197-b885-70a7e0064bcb@vondra.me
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi!

On 5/13/26 08:06, Chengpeng Yan wrote:
> Hi Tomas,
>
> Thanks for the detailed comments. I do not think I can answer all of the
> questions yet, but I would like to first align my current understanding
> and reply to the parts where I have some thoughts. For the remaining
> parts, I think I need more analysis/research before making stronger
> claims.
>

That's fine. No one expect you to have all the answers right away. It's
a matter for future research. Also, it's just random ideas, and some of
them may be misguided - feel free to push back against that, please.

>> On May 12, 2026, at 21:42, Tomas Vondra <tomas(at)vondra(dot)me> wrote:
>>
>> After reading this, my thinking was "we should assume the estimates are
>> accurate" because with bogus estimates we can end up with arbitrarily
>> bad plans. Garbage in, garbage out.
>>
>> But maybe it's not as black-and-white? I agree if one method is much
>> more robust (i.e. performs better with estimates that are close enough
>> to the actual values), then that seems like an important feature for
>> this type of heuristics.
>>
>> I wonder how "bad" the estimates are in the presented example, both in
>> the good and bad runs.
>
> I agree with this point. My earlier benchmark was more of an end-to-end
> test with current postgres estimates, including cases where the
> estimates can be quite wrong. I used that setup because complex join
> estimates seem very hard to make accurate in all cases, so these errors
> are likely to remain part of the practical problem.
>
> But for evaluating the join-order algorithm itself, I agree we also need
> a setup where the estimates are closer to the actual values, for example
> by first measuring actual row counts/cardinalities and then using those
> values instead of normal planner estimates. That would answer a
> different question: the current results show end-to-end behavior with
> the current postgres estimator, while the additional setup would better
> isolate the search algorithm.
>
> If the estimates are very wrong, the selected plan may be dominated by
> estimator error rather than by the search method. In that case, I am not
> yet sure how much a win/loss between two heuristics proves about the
> join search algorithm itself. On the other hand, if the estimates are
> close enough and one method is still more robust, I agree that would be
> an important property for this kind of heuristic.
>
> For the presented examples, I need to look more carefully at the good
> and bad runs. From a first look at some severe regressions, the issue
> seems to be that some intermediate join row estimates are off by orders
> of magnitude, but I do not want to claim that as the full answer before
> doing a more systematic analysis.
>

Agreed. Seeing results for joins with accurate estimates would be helpful.

I initially thought we should probably just ignore behavior on examples
with too inaccurate estimates (because the "exact" DP would likely make
bogus decisions too).

But maybe one of the heuristics is somehow more robust, and that seems
like a quality of it's own?

>> IMHO it's futile to search for a perfect heuristics. It we had that, we
>> wouldn't need the DP mode at all. This can't be the criteria, because no
>> heuristics would pass that.
>
> Agreed. I did not mean that the goal should be a heuristic with no bad
> cases. That would not be realistic.
>
> What I had in mind was a weaker practical goal: assuming planning time
> stays acceptable and plan advice can still be used to correct bad cases,
> can the default setting reduce the probability of severe bad cases? For
> example, using purely made-up numbers, if the new algorithm's default
> setting has severe regressions in 10% of cases, can we reduce that to 5%
> or lower? The point is to reduce the probability of bad cases, not to
> eliminate them.
>
> I am not fully sure yet whether this is the right goal/criterion,
> though.
>

That sounds like a sensible goal. It's more or less what I had in mind
when I proposed evaluating the heuristics on random joins generated by a
script. Seeing on what fraction of the cases is "A" better than "B",
vice versa, by how much, ...

>> Regarding the two proposed directions, it's not very clear to me why the
>> first direction would require broader changes, particularly changes that
>> would be unnecessary for the second direction (DP for prefix).
>>
>> To me it seems the second approach is actually an advanced version of
>> the first one. How to you pick the prefix to handle by DP if not by some
>> heuristics, requiring this new signals?
>
> That's fair. The DP-prefix approach still needs a heuristic to choose
> the frontier seed, so it does not avoid the ranking problem. What I
> meant was mostly about scope, not a strict dependency between the two
> directions. The current GOO variants mostly use signals already
> available in postgres at that point, such as estimated cost and
> estimated result size. If those signals are wrong, different variants
> may still share similar failure patterns. To get a substantially
> different signal, we might need something outside the current GOO patch.
>
> For example, the CIDR 2021 paper "Simplicity Done Right for Join
> Ordering" seems like one possible direction for the first approach. My
> current reading is that it is not only changing a local choice inside
> the current GOO rule. It also changes the way candidate joins are
> formed/connected, and it may need extra statistics about join
> attributes, such as maximum value frequency, to make the risk signal
> work. That seems broader than only changing the GOO search rule.
>
> That is why I originally looked at the DP-prefix direction first: not
> because I think it is definitely the better direction, but because it
> seemed more contained to the join search algorithm and could initially
> reuse existing signals like cost/result size. Better signals could still
> be used later for prefix/frontier selection if that direction turns out
> to be useful. I am probably still missing some of the tradeoffs, and I
> need to study the paper, the failure cases, and other people's views
> more carefully.
>
> After this discussion, it may make more sense to first make the pure GOO
> baseline and the evaluation framework clearer, and then decide which
> follow-up direction is worth pursuing.
>

+1 to do "simple GOO" first, followed by various improvements. It could
even be implemented in one patch series, but structured like that. It
makes review / discussions way simpler.

>> TBH I don't quite understand what the proposed approach with "using
>> exact DP for a prefix of the problem" is meant to do. As of now we split
>> the join problem into smaller parts per join_collapse_limit (=8). If
>> these problems are "too large" for DP (geqo_threshold=12), we use the
>> heuristics (GEQO) mode. Or do I misremember this?
>>
>> Maybe I'm just "too used" to this, but this seems reasonable to me. Try
>> searching for a "perfect" solution first (within reason), and only for
>> large problems fall back to something approximate. Sensible, no?
>>
>> To got to the GEQO/GOO code, people have to adjust the limits, so that
>> (join_collapse_limit >= geqo_threshold). AFAIK almost no one does that,
>> so most join problems are smaller that geqo_threshold and so handled by
>> regular DP (for smaller subproblems).
>>
>> But let's say someone adjusts the GUCs, gets to GOO, and it handles a
>> prefix of the problem using the DP approach. How is that different from
>> keeping the (join_collapse_limit < geqo_threshold) and not even getting
>> to GOO? Why not to just leave join_collapse_limit to a low value?
>
> This is a very good question. I need to better understand the current
> implementation and think about this more carefully before giving an
> answer.
>
>> Right. When replacing one heuristics with a different one, there will
>> almost certainly be regressions. Each heuristics will explore a slightly
>> different subset of the solution space, and it's a matter of luck which
>> gets a substantially better solution.
>>
>> I don't have a fully formed idea how to evaluate this, but I think the
>> only way to "prove" a the new heuristics is better is to test a lot of
>> complex join queries, and look at the overall statistics. See how long
>> the planning took, see how many queries got faster/slower, etc.
>
> Agreed. I think we need to test many more large join queries. As
> mentioned above, I think it would be useful to keep the current-estimate
> and accurate-estimate evaluations separate, because they answer related
> but different questions.
>

+1

>> The JOB is certainly one option to do that, and it's valuable because
>> the queries are meant to be realistic / from actual application. But
>> there's not all that many of them.
>
> Yes. In addition to JOB and JOB-Complex, the benchmark repository also
> has an extended IMDB-backed workload, `imdb_ceb_3k`, with 842 queries
> involving at least 12 base relations [1]. It uses the same IMDB
> schema/data family, so it should be a useful larger test set after the
> smaller JOB/JOB-Complex subset.
>
> I have not run that full set for this patch yet, but I plan to use it
> for the next round of evaluation.
>

Maybe it'd make sense to evaluate GEQO/GOO even with smaller joins?

I don't know if that tells us anything, or which geqo_threshold values
are interesting. Maybe not all the way back to 2, because as John wrote
earlier, for small problems GEQO may actually cover the whole space.

But maybe if we run the tests with geqo_threshold=2..12 for joins of
different sizes, it'd tell us a couple things:

a) At which value it starts producing plans different from DP?

b) How the planning time grows for DP/GEQO/GOO, and where the DP gets
too expensive.

c) How the execution time changes for DP/GEQO/GOO.

I honestly don't know what exactly it makes sense to test. In cases like
this I usually run a lot of tests (generated randomly) for different
combinations of parameters, and then "ask questions" on the data.

>> I think it would be useful to write a script that generates joins of
>> arbitrary complexity (number of relations, how connected they are), and
>> see how it works on those. It could even generate data to get estimates
>> with adjustable inaccuracy.
>
> That makes sense, but I need to think more about it. I am not sure yet
> whether this would be easy to design in a useful way.
>

Me neither. But it seems like an interesting thought experiment.

regards

--
Tomas Vondra

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2026-05-13 12:34:17 Re: Adding REPACK [concurrently]
Previous Message Robert Haas 2026-05-13 12:13:09 Re: FOR PORTION OF vs. object_aclcheck