Re: Add a greedy join search algorithm to handle large join problems

From: Tomas Vondra <tomas(at)vondra(dot)me>
To: John Naylor <johncnaylorls(at)gmail(dot)com>
Cc: Chengpeng Yan <chengpeng_yan(at)outlook(dot)com>, lakshmi <lakshmigcdac(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: Add a greedy join search algorithm to handle large join problems
Date: 2026-05-13 11:54:20
Message-ID: 8e9509d3-b6a3-4604-bd26-8777386de216@vondra.me
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 5/13/26 06:34, John Naylor wrote:
> On Tue, May 12, 2026 at 8:42 PM Tomas Vondra <tomas(at)vondra(dot)me> wrote:
>>
>> On 5/12/26 13:51, John Naylor wrote:
>>> On Mon, May 11, 2026 at 9:33 PM Chengpeng Yan <chengpeng_yan(at)outlook(dot)com> wrote:
>
>>>> GOO(combined) has the lower execution-time sum in four of the five
>>>> rounds. The exact bucket counts are not identical in every round, but
>>>> each round still has both large GOO(combined) wins and large
>>>> regressions. More importantly, GEQO varies much more across statistics
>>>> snapshots: its execution-time sum ranges from 51s to 575s, while
>>>> GOO(combined) stays in the 84s-88s range.
>>>
>>> That's an interesting finding! We'll want to keep this behavior in
>>> mind and see how reproducible it is. You mentioned you didn't want to
>>> draw any general conclusions (reasonable), but a 10x variation just
>>> from statistics does put a new perspective on the test results.
>>>
>>
>> After reading this, my thinking was "we should assume the estimates are
>> accurate" because with bogus estimates we can end up with arbitrarily
>> bad plans. Garbage in, garbage out.
>>
>> But maybe it's not as black-and-white? I agree if one method is much
>> more robust (i.e. performs better with estimates that are close enough
>> to the actual values), then that seems like an important feature for
>> this type of heuristics.
>
> Exactly. (The operative word being "if")
>
> At the very least, this variation can complicate getting reliable test results.
>
>> I wonder how "bad" the estimates are in the presented example, both in
>> the good and bad runs.
>
> Yeah, one deliberate feature of JOB is that it has real-world data
> distribution that confound DBMS's common assumptions about data
> independance etc. I wonder if a synthetic benchmark with more
> uniform/independent data would have less variation here.
>

Yes, that's my understanding too. What does that mean for using JOB to
evaluate these patches? If the optimizer does substantial estimation
errors, what can we conclude from heuristics performing well or poorly
on those queries?

I initially though "garbage in, garbage out", but maybe if a heuristics
performs systemically better (on a large number of random test cases),
maybe it handles "risk" better?

>>>> So I am still looking at the two follow-up directions mentioned earlier,
>>>> both building on the current GOO work. One is to improve pure-GOO
>>>> candidate generation and the final selector, for example by adding more
>>>> diverse strategies and making the selector consider signals beyond final
>>>> estimated cost. The other is to use exact DP for a prefix of the problem
>>>> and fall back to GOO when the DP budget is not enough. The first
>>>> direction may need broader changes across planner or estimation-related
>>>> code,
>>>
>>> I agree we should try to avoid strategies that depend on changing other areas.
>>>
>>
>> IMHO it's futile to search for a perfect heuristics. It we had that, we
>> wouldn't need the DP mode at all. This can't be the criteria, because no
>> heuristics would pass that.
>
> Agreed.
>
>> TBH I don't quite understand what the proposed approach with "using
>> exact DP for a prefix of the problem" is meant to do. As of now we split
>> the join problem into smaller parts per join_collapse_limit (=8). If
>> these problems are "too large" for DP (geqo_threshold=12), we use the
>> heuristics (GEQO) mode. Or do I misremember this?
>>
>> Maybe I'm just "too used" to this, but this seems reasonable to me. Try
>> searching for a "perfect" solution first (within reason), and only for
>> large problems fall back to something approximate. Sensible, no?
>
> Yes, I'm very much in favor of that concept. My concern was, trying to
> do this now may be trying to do too much at once.
>

Probably. I understood the goal of GOO to be a drop-in replacement GEQO,
and my intuition is to keep the scope limited to that goal.

That does not mean GOO can't do more later, but I'd definitely structure
it with doing a "simplest possible" GEQO replacement first, and then
maybe do more smart stuff later.

>> To got to the GEQO/GOO code, people have to adjust the limits, so that
>> (join_collapse_limit >= geqo_threshold). AFAIK almost no one does that,
>> so most join problems are smaller that geqo_threshold and so handled by
>> regular DP (for smaller subproblems).
>>
>> But let's say someone adjusts the GUCs, gets to GOO, and it handles a
>> prefix of the problem using the DP approach. How is that different from
>> keeping the (join_collapse_limit < geqo_threshold) and not even getting
>> to GOO? Why not to just leave join_collapse_limit to a low value?
>
> One flavor of the second idea above (found in the literature) is to
> start with DP, and after some new GUC limit (under the hood: # of
> times calling make_join_rel), pick one of the incomplete subproblems
> and finish with a heuristic search. It switches strategy on the fly,
> rather than choosing upfront. That's the difference from now, although
> I'm not sure offhand how join_collase_limit chooses its subproblems
> out of the bigger problem.
>

Not sure, for two reasons.

How often do we expect this to be used? AFAIK GEQO is not used very
often in practice, because geqo_threshold is so much higher than
join_collapse_limit. So even huge joins get divided into problems
handled by DP. We might invest a lot of time and complexity into
something that gets used extremely rarely ...

Also, couldn't this easily lead to unpredictable planning behavior? Say
the budget relies on counting "make_join_rel" calls, or something like
that. Then someone adds or removes an index on one of the tables, and we
suddenly start to consider fewer/more candidate orders. Or maybe someone
refactors the code a little bit, moving the calls. And suddenly, we hit
the threshold at a different place. But maybe it's OK for a heuristics?

>> Right. When replacing one heuristics with a different one, there will
>> almost certainly be regressions. Each heuristics will explore a slightly
>> different subset of the solution space, and it's a matter of luck which
>> gets a substantially better solution.
>>
>> I don't have a fully formed idea how to evaluate this, but I think the
>> only way to "prove" a the new heuristics is better is to test a lot of
>> complex join queries, and look at the overall statistics. See how long
>> the planning took, see how many queries got faster/slower, etc.
>>
>> The JOB is certainly one option to do that, and it's valuable because
>> the queries are meant to be realistic / from actual application. But
>> there's not all that many of them.
>>
>> I think it would be useful to write a script that generates joins of
>> arbitrary complexity (number of relations, how connected they are), and
>> see how it works on those. It could even generate data to get estimates
>> with adjustable inaccuracy.
>
> +1
>
> With unnaturally uniform/independent data, I'm curious if the
> stats-dependent variation seen above for GEQO would disappear/lessen.
> (I'm not sure how hard adjustable inaccuracy would be)
>

Yeah, that was my guess too.

regards

--
Tomas Vondra

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Sharma 2026-05-13 11:55:25 Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication
Previous Message Peter Eisentraut 2026-05-13 11:52:01 Re: bug: UPDATE FOR PORTION OF interact with updatable view