Re: Add a greedy join search algorithm to handle large join problems

From: Chengpeng Yan <chengpeng_yan(at)outlook(dot)com>
To: Tomas Vondra <tomas(at)vondra(dot)me>
Cc: John Naylor <johncnaylorls(at)gmail(dot)com>, lakshmi <lakshmigcdac(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: Add a greedy join search algorithm to handle large join problems
Date: 2026-05-13 06:06:16
Message-ID: 02FD9606-3748-49C6-A5CC-50CFB3AFB890@outlook.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Tomas,

Thanks for the detailed comments. I do not think I can answer all of the
questions yet, but I would like to first align my current understanding
and reply to the parts where I have some thoughts. For the remaining
parts, I think I need more analysis/research before making stronger
claims.

> On May 12, 2026, at 21:42, Tomas Vondra <tomas(at)vondra(dot)me> wrote:
>
> After reading this, my thinking was "we should assume the estimates are
> accurate" because with bogus estimates we can end up with arbitrarily
> bad plans. Garbage in, garbage out.
>
> But maybe it's not as black-and-white? I agree if one method is much
> more robust (i.e. performs better with estimates that are close enough
> to the actual values), then that seems like an important feature for
> this type of heuristics.
>
> I wonder how "bad" the estimates are in the presented example, both in
> the good and bad runs.

I agree with this point. My earlier benchmark was more of an end-to-end
test with current postgres estimates, including cases where the
estimates can be quite wrong. I used that setup because complex join
estimates seem very hard to make accurate in all cases, so these errors
are likely to remain part of the practical problem.

But for evaluating the join-order algorithm itself, I agree we also need
a setup where the estimates are closer to the actual values, for example
by first measuring actual row counts/cardinalities and then using those
values instead of normal planner estimates. That would answer a
different question: the current results show end-to-end behavior with
the current postgres estimator, while the additional setup would better
isolate the search algorithm.

If the estimates are very wrong, the selected plan may be dominated by
estimator error rather than by the search method. In that case, I am not
yet sure how much a win/loss between two heuristics proves about the
join search algorithm itself. On the other hand, if the estimates are
close enough and one method is still more robust, I agree that would be
an important property for this kind of heuristic.

For the presented examples, I need to look more carefully at the good
and bad runs. From a first look at some severe regressions, the issue
seems to be that some intermediate join row estimates are off by orders
of magnitude, but I do not want to claim that as the full answer before
doing a more systematic analysis.

> IMHO it's futile to search for a perfect heuristics. It we had that, we
> wouldn't need the DP mode at all. This can't be the criteria, because no
> heuristics would pass that.

Agreed. I did not mean that the goal should be a heuristic with no bad
cases. That would not be realistic.

What I had in mind was a weaker practical goal: assuming planning time
stays acceptable and plan advice can still be used to correct bad cases,
can the default setting reduce the probability of severe bad cases? For
example, using purely made-up numbers, if the new algorithm's default
setting has severe regressions in 10% of cases, can we reduce that to 5%
or lower? The point is to reduce the probability of bad cases, not to
eliminate them.

I am not fully sure yet whether this is the right goal/criterion,
though.

> Regarding the two proposed directions, it's not very clear to me why the
> first direction would require broader changes, particularly changes that
> would be unnecessary for the second direction (DP for prefix).
>
> To me it seems the second approach is actually an advanced version of
> the first one. How to you pick the prefix to handle by DP if not by some
> heuristics, requiring this new signals?

That's fair. The DP-prefix approach still needs a heuristic to choose
the frontier seed, so it does not avoid the ranking problem. What I
meant was mostly about scope, not a strict dependency between the two
directions. The current GOO variants mostly use signals already
available in postgres at that point, such as estimated cost and
estimated result size. If those signals are wrong, different variants
may still share similar failure patterns. To get a substantially
different signal, we might need something outside the current GOO patch.

For example, the CIDR 2021 paper "Simplicity Done Right for Join
Ordering" seems like one possible direction for the first approach. My
current reading is that it is not only changing a local choice inside
the current GOO rule. It also changes the way candidate joins are
formed/connected, and it may need extra statistics about join
attributes, such as maximum value frequency, to make the risk signal
work. That seems broader than only changing the GOO search rule.

That is why I originally looked at the DP-prefix direction first: not
because I think it is definitely the better direction, but because it
seemed more contained to the join search algorithm and could initially
reuse existing signals like cost/result size. Better signals could still
be used later for prefix/frontier selection if that direction turns out
to be useful. I am probably still missing some of the tradeoffs, and I
need to study the paper, the failure cases, and other people's views
more carefully.

After this discussion, it may make more sense to first make the pure GOO
baseline and the evaluation framework clearer, and then decide which
follow-up direction is worth pursuing.

> TBH I don't quite understand what the proposed approach with "using
> exact DP for a prefix of the problem" is meant to do. As of now we split
> the join problem into smaller parts per join_collapse_limit (=8). If
> these problems are "too large" for DP (geqo_threshold=12), we use the
> heuristics (GEQO) mode. Or do I misremember this?
>
> Maybe I'm just "too used" to this, but this seems reasonable to me. Try
> searching for a "perfect" solution first (within reason), and only for
> large problems fall back to something approximate. Sensible, no?
>
> To got to the GEQO/GOO code, people have to adjust the limits, so that
> (join_collapse_limit >= geqo_threshold). AFAIK almost no one does that,
> so most join problems are smaller that geqo_threshold and so handled by
> regular DP (for smaller subproblems).
>
> But let's say someone adjusts the GUCs, gets to GOO, and it handles a
> prefix of the problem using the DP approach. How is that different from
> keeping the (join_collapse_limit < geqo_threshold) and not even getting
> to GOO? Why not to just leave join_collapse_limit to a low value?

This is a very good question. I need to better understand the current
implementation and think about this more carefully before giving an
answer.

> Right. When replacing one heuristics with a different one, there will
> almost certainly be regressions. Each heuristics will explore a slightly
> different subset of the solution space, and it's a matter of luck which
> gets a substantially better solution.
>
> I don't have a fully formed idea how to evaluate this, but I think the
> only way to "prove" a the new heuristics is better is to test a lot of
> complex join queries, and look at the overall statistics. See how long
> the planning took, see how many queries got faster/slower, etc.

Agreed. I think we need to test many more large join queries. As
mentioned above, I think it would be useful to keep the current-estimate
and accurate-estimate evaluations separate, because they answer related
but different questions.

> The JOB is certainly one option to do that, and it's valuable because
> the queries are meant to be realistic / from actual application. But
> there's not all that many of them.

Yes. In addition to JOB and JOB-Complex, the benchmark repository also
has an extended IMDB-backed workload, `imdb_ceb_3k`, with 842 queries
involving at least 12 base relations [1]. It uses the same IMDB
schema/data family, so it should be a useful larger test set after the
smaller JOB/JOB-Complex subset.

I have not run that full set for this patch yet, but I plan to use it
for the next round of evaluation.

> I think it would be useful to write a script that generates joins of
> arbitrary complexity (number of relations, how connected they are), and
> see how it works on those. It could even generate data to get estimates
> with adjustable inaccuracy.

That makes sense, but I need to think more about it. I am not sure yet
whether this would be easy to design in a useful way.

References:

[1] Benchmark workloads:
https://github.com/Reminiscent/join-order-benchmark/blob/main/WORKLOADS.md
--
Best regards,
Chengpeng Yan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2026-05-13 06:07:19 Re: Proposal: Conflict log history table for Logical Replication
Previous Message SungJun Jang 2026-05-13 05:50:28 Re: Remove invalid SS2/SS3 handling from EUC-KR routines