Re: Add a greedy join search algorithm to handle large join problems

From: John Naylor <johncnaylorls(at)gmail(dot)com>
To: Tomas Vondra <tomas(at)vondra(dot)me>
Cc: Chengpeng Yan <chengpeng_yan(at)outlook(dot)com>, lakshmi <lakshmigcdac(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: Add a greedy join search algorithm to handle large join problems
Date: 2026-05-14 05:59:55
Message-ID: CANWCAZZkHwULLrZhyXbsyi5adXjwDTKFX0PZ9Z5pep1zh2=gAw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, May 13, 2026 at 6:54 PM Tomas Vondra <tomas(at)vondra(dot)me> wrote:
>
> On 5/13/26 06:34, John Naylor wrote:
> > On Tue, May 12, 2026 at 8:42 PM Tomas Vondra <tomas(at)vondra(dot)me> wrote:

> >> To got to the GEQO/GOO code, people have to adjust the limits, so that
> >> (join_collapse_limit >= geqo_threshold). AFAIK almost no one does that,
> >> so most join problems are smaller that geqo_threshold and so handled by
> >> regular DP (for smaller subproblems).
> >>
> >> But let's say someone adjusts the GUCs, gets to GOO, and it handles a
> >> prefix of the problem using the DP approach. How is that different from
> >> keeping the (join_collapse_limit < geqo_threshold) and not even getting
> >> to GOO? Why not to just leave join_collapse_limit to a low value?
> >
> > One flavor of the second idea above (found in the literature) is to
> > start with DP, and after some new GUC limit (under the hood: # of
> > times calling make_join_rel), pick one of the incomplete subproblems
> > and finish with a heuristic search. It switches strategy on the fly,
> > rather than choosing upfront. That's the difference from now, although
> > I'm not sure offhand how join_collase_limit chooses its subproblems
> > out of the bigger problem.
> >
>
> Not sure, for two reasons.
>
> How often do we expect this to be used? AFAIK GEQO is not used very
> often in practice, because geqo_threshold is so much higher than
> join_collapse_limit. So even huge joins get divided into problems
> handled by DP. We might invest a lot of time and complexity into
> something that gets used extremely rarely ...

That's a risk to keep in mind.

In cases where planning time is a substantial part of the runtime
(like in your thread on OLTP star joins), a user could lower the
threshold for greedy search, since it's fast. That's not an option
currently because GEQO is slow.

With a dynamic strategy, I imagine join_collapse_limit would mean
something different (one technique in the literature is to refine the
greedy output on subproblems of size 4-7), or be replaced with a
different tuning knob.

Anyway, these questions reinforce that this is potential follow-up work.

> Also, couldn't this easily lead to unpredictable planning behavior? Say
> the budget relies on counting "make_join_rel" calls, or something like
> that. Then someone adds or removes an index on one of the tables, and we
> suddenly start to consider fewer/more candidate orders. Or maybe someone
> refactors the code a little bit, moving the calls. And suddenly, we hit
> the threshold at a different place. But maybe it's OK for a heuristics?

That's a good question.

--
John Naylor
Amazon Web Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2026-05-14 06:02:18 Re: Fix jsonpath .split_part() to honor silent mode
Previous Message Michael Paquier 2026-05-14 05:29:57 Re: Add pg_stat_kind_info system view