Re: Add a greedy join search algorithm to handle large join problems

From: Tomas Vondra <tomas(at)vondra(dot)me>
To: John Naylor <johncnaylorls(at)gmail(dot)com>, Chengpeng Yan <chengpeng_yan(at)outlook(dot)com>
Cc: lakshmi <lakshmigcdac(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: Add a greedy join search algorithm to handle large join problems
Date: 2026-05-12 13:42:32
Message-ID: a43d5bc7-4676-48e0-a5ae-d01f29fb97e6@vondra.me
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 5/12/26 13:51, John Naylor wrote:
> On Mon, May 11, 2026 at 9:33 PM Chengpeng Yan <chengpeng_yan(at)outlook(dot)com> wrote:
>
>> ...
>> I also noticed a statistics-sensitivity issue while rerunning the
>> large-query subset. I ran five rounds over the same 33 queries and the
>> same DP / GEQO / GOO(combined) variants. Each round refreshed statistics
>> before executing the measured queries. I saved the statistics snapshot
>> with pg_dump --statistics-only and kept the per-round result
>> spreadsheets; these are in round_artifacts-20260511.zip, organized as
>> round1 through round5. The per-run measured repetitions are
>> comparatively stable; the large differences below correspond to changed
>> plans after refreshed statistics, not just measurement noise.
>>
>> Execution-time sum:
>>
>> | round | DP ms | GEQO ms | GOO(combined) ms |
>> | --- | ---: | ---: | ---: |
>> | round1 | 41091 | 574582 | 87706 |
>> | round2 | 41331 | 51205 | 85038 |
>> | round3 | 40493 | 95580 | 85561 |
>> | round4 | 41464 | 99228 | 83930 |
>> | round5 | 41611 | 126544 | 85578 |
>>
>> GOO(combined) has the lower execution-time sum in four of the five
>> rounds. The exact bucket counts are not identical in every round, but
>> each round still has both large GOO(combined) wins and large
>> regressions. More importantly, GEQO varies much more across statistics
>> snapshots: its execution-time sum ranges from 51s to 575s, while
>> GOO(combined) stays in the 84s-88s range.
>
> That's an interesting finding! We'll want to keep this behavior in
> mind and see how reproducible it is. You mentioned you didn't want to
> draw any general conclusions (reasonable), but a 10x variation just
> from statistics does put a new perspective on the test results.
>

After reading this, my thinking was "we should assume the estimates are
accurate" because with bogus estimates we can end up with arbitrarily
bad plans. Garbage in, garbage out.

But maybe it's not as black-and-white? I agree if one method is much
more robust (i.e. performs better with estimates that are close enough
to the actual values), then that seems like an important feature for
this type of heuristics.

I wonder how "bad" the estimates are in the presented example, both in
the good and bad runs.

>> As I mentioned in the previous message, the remaining GOO(combined)
>> regressions fall into two groups: sometimes neither cost nor result_size
>> finds a good greedy candidate, and sometimes a good greedy candidate
>> exists but the final estimated-cost selector chooses a slower plan.
>>
>> So I am still looking at the two follow-up directions mentioned earlier,
>> both building on the current GOO work. One is to improve pure-GOO
>> candidate generation and the final selector, for example by adding more
>> diverse strategies and making the selector consider signals beyond final
>> estimated cost. The other is to use exact DP for a prefix of the problem
>> and fall back to GOO when the DP budget is not enough. The first
>> direction may need broader changes across planner or estimation-related
>> code,
>
> I agree we should try to avoid strategies that depend on changing other areas.
>

IMHO it's futile to search for a perfect heuristics. It we had that, we
wouldn't need the DP mode at all. This can't be the criteria, because no
heuristics would pass that.

Regarding the two proposed directions, it's not very clear to me why the
first direction would require broader changes, particularly changes that
would be unnecessary for the second direction (DP for prefix).

To me it seems the second approach is actually an advanced version of
the first one. How to you pick the prefix to handle by DP if not by some
heuristics, requiring this new signals?

>> so I am currently focusing more on the second direction and will
>> share the code and numbers once I have a clearer result.
>
> I've mentioned elsewhere that graceful degradation is a good property
> (and a prerequisite for robustness), and cliff behavior is one
> disadvantage of the GEQO search. However, it's not going to scale:
> There will always be large join problems where a heuristic search will
> find terrible plans, and that means severe regressions are possible,
> as we've seen. Further, this direction opens up another large design
> space that I fear will complicate testing and review, and it's already
> hard enough.
>

TBH I don't quite understand what the proposed approach with "using
exact DP for a prefix of the problem" is meant to do. As of now we split
the join problem into smaller parts per join_collapse_limit (=8). If
these problems are "too large" for DP (geqo_threshold=12), we use the
heuristics (GEQO) mode. Or do I misremember this?

Maybe I'm just "too used" to this, but this seems reasonable to me. Try
searching for a "perfect" solution first (within reason), and only for
large problems fall back to something approximate. Sensible, no?

To got to the GEQO/GOO code, people have to adjust the limits, so that
(join_collapse_limit >= geqo_threshold). AFAIK almost no one does that,
so most join problems are smaller that geqo_threshold and so handled by
regular DP (for smaller subproblems).

But let's say someone adjusts the GUCs, gets to GOO, and it handles a
prefix of the problem using the DP approach. How is that different from
keeping the (join_collapse_limit < geqo_threshold) and not even getting
to GOO? Why not to just leave join_collapse_limit to a low value?

> So, the above doesn't seem like a logical next step from the data thus
> far. (To be clear, I think it's good future work, but that presumes
> getting to a first commit).
>
> To my mind, we still need more data to inform future direction. The
> most important things IMO are
> - plan advice -- My current thinking is, the plan advice aspect is the
> "wild card" that could immediately improve user experience with large
> joins. Conversely, without the ability to fix regressions, any new
> approach is risky. How well does plan advice work with GEQO? Does it
> work with GOO already or would that need additional coding?
> - more large join benchmarks (I see you have additional benchmarks in
> your repository above)
>

Right. When replacing one heuristics with a different one, there will
almost certainly be regressions. Each heuristics will explore a slightly
different subset of the solution space, and it's a matter of luck which
gets a substantially better solution.

I don't have a fully formed idea how to evaluate this, but I think the
only way to "prove" a the new heuristics is better is to test a lot of
complex join queries, and look at the overall statistics. See how long
the planning took, see how many queries got faster/slower, etc.

The JOB is certainly one option to do that, and it's valuable because
the queries are meant to be realistic / from actual application. But
there's not all that many of them.

I think it would be useful to write a script that generates joins of
arbitrary complexity (number of relations, how connected they are), and
see how it works on those. It could even generate data to get estimates
with adjustable inaccuracy.

regards

--
Tomas Vondra

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message John Mikk 2026-05-12 14:06:08 [PATCH] Fix infinite recursion when foreign table references itself
Previous Message Jim Jones 2026-05-12 13:27:14 Re: COPY ON_CONFLICT TABLE; save duplicated record to another table.