Re: Add a greedy join search algorithm to handle large join problems

From: Chengpeng Yan <chengpeng_yan(at)Outlook(dot)com>
To: Tomas Vondra <tomas(at)vondra(dot)me>
Cc: "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, John Naylor <johncnaylorls(at)gmail(dot)com>
Subject: Re: Add a greedy join search algorithm to handle large join problems
Date: 2025-12-28 02:54:16
Message-ID: AEE7DBDE-8778-4AAC-8890-ED2B936C0854@Outlook.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


> On Dec 28, 2025, at 01:00, Tomas Vondra <tomas(at)vondra(dot)me> wrote:
>
> You realize this does not actually change shared buffers size, right?
> Because that requires a restart, not just pg_reload_conf. Are you sure
> you're running with 4GB shared buffers?

Good catch, and sorry for not mentioning that explicitly. I did restart
the instance and verified that the new shared_buffers setting was in
effect. I’ll make sure to call this out clearly in future descriptions.

> I'm not sure SF=1 is sufficient for making any clear conclusions. No
> matter what the shared_buffers value is, it's going to fit into RAM, the
> runs are going to be fully cached.

Thanks for the comment. I agree that TPC-H at SF=1 is too small to
support strong conclusions. I mainly used SF=1 as a convenient starting
point to quickly validate ideas and iterate on the implementation. I’ve
already run experiments on JOB and plan to share those results next,
where the behavior should be more representative and the conclusions
more meaningful.

> I don't quite see how we could make good decisions if the estimates are
> this wrong. Garbage in, garbage out. AFAICS this affects both the
> regular DP planning, which heavily relies on the costing, but also the
> approaches on heuristics, which still rely on estimates in some way.
>
> If the aggregate subquery is 1000x off, why should any of the following
> decisions be right? The approaches may have different tolerance for
> estimation errors - some may be "defensive" and handle misestimates
> better, but the trade off is that it may pick slower plans when the
> estimates are exact.
>
> I don't think there's an "optimal" approach picking the best plan in
> both cases. If there was, join ordering wouldn't be such a challenge.
>
> I don't think Q20 is about greediness. Poor estimates are going to be an
> issue for all approaches relying on them in some way.
>
> The issue with Q7 seems pretty inherent to greedy algorithms. Picking
> solutions that are optimal locally but not globally is the definition of
> "greedy". I don't think it matters which exact metric is used, this flaw
> is built-in. And I don't think it's solvable.

I agree that all approaches are affected by estimation errors, but as
you noted, they differ in how much error they can tolerate. In its
current form, GOO is still quite fragile in this respect and can exhibit
severe regressions when estimates are badly off. Based on additional
experiments, I also agree that a single greedy rule inevitably has cases
where it performs extremely poorly; as a result, my current direction is
no longer to further tune the greedy criterion itself, but to improve
the robustness of the existing greedy choices through additional
mechanisms.

Regarding Q20, I agree that this is not fundamentally a “greedy” issue.
If the rowcount estimates were reasonably accurate, the extreme behavior
observed there would likely be avoided. At the same time, when estimates
are wrong by orders of magnitude, the current GOO approach does not
handle this well, and improving robustness under such severe
misestimation is exactly one of the main aspects I am currently working
on.

I have already run additional experiments in this direction, using
result_size and cost as the base signals, and will share the results
separately.

> Sufficient for what? Sufficient to replace DP or to replace GEQO?
>
> I don't think we'd want to replace DP with such greedy algorithm. With
> accurate estimates and modest number of joins, we should be able to find
> the best join (more or less).
>
> But this thread is not about replacing DP, I see GOO more as a GEQO
> replacement, right? Or did the goal change?

The goal has always been to use GOO as a replacement for GEQO, not DP.

> I'm not sure ignoring selectivity entirely is a good plan, though. Yes,
> if the selectivity/estimate is significantly off, the plan may be poor.
> But what exactly do you plan to use instead? Isn't a lot of the metrics
> (output size, ...) derived from the selectivity/estimate anyway?
>
> I agree the JOB benchmark may be a better fit for this. The TPC-H is
> pretty simple, and even for TPC-DS the number of joins is not all that
> high. Sorry if my initial feedback suggested these are the proper
> benchmarks to evaluate this.
>
> I suggest the evaluation should focus on cases where we expect GOO to
> actually be used in practice - as a replacement for GEQO. Only when it
> proves useful/acceptable for that use case (for sufficiently many joins
> etc.), we can start comparing with DP to figure out if we need to adjust
> the thresholds and so on.
>
> Another suggestion is to also test with larger data sets. The problems
> with SF=10 or SF=100 may be very different, even with TPC-H. Also,
> consider testing with "cold" cases, as if there's nothing cached by
> restarting the Postgres instance and drop page cache between runs.

Thanks for the feedback.

I’ve already run experiments on the JOB benchmark and will share the
results shortly, including a comparison against GEQO, with DP used as a
baseline. I agree that the primary goal at this stage is to show that
the approach is good enough for the intended use case first, before
looking into secondary aspects such as threshold tuning.

Larger scale factors and cold-cache runs are also part of my upcoming
evaluation plan, and I agree those are important dimensions to cover.

Regarding selectivity, I agree the situation is not entirely clear-cut.
As you noted, result_size is also derived from selectivity to some
extent, and many bad cases share similar characteristics. However, in my
initial TPC-H SF=1 experiments, selectivity-based greedy rules already
showed many poor cases, and I expect this to become worse in more
complex scenarios. For now, I therefore focused on cost and result_size:
cost because it is a first-class concept in PostgreSQL, and result_size
because it is commonly cited in the literature as a more robust signal
for greedy join ordering. If you think selectivity is still an important
signal to evaluate, I can certainly add further experiments for it, but
based on the above I have not pursued additional selectivity-focused
tests so far.
>
> I realized the paper mentioned at the beginning of this thread is from
> 1998. That doesn't make it wrong, but I was wondering if there are some
> newer papers about join order search, with interesting
> alternative/better approaches.
>
> An interesting paper I found is this CIDR21 paper:
>
> Simplicity Done Right for Join Ordering
> Axel Hertzschuch, Claudio Hartmann, Dirk Habich, Wolfgang Lehner
> https://vldb.org/cidrdb/2021/simplicity-done-right-for-join-ordering.html
> https://vldb.org/cidrdb/papers/2021/cidr2021_paper01.pdfl
>
> Which does something similar to our approach, although it seems to be
> more a replacement for the DP and not just for GEQO. It's meant to be
> cheaper, but also more resilient to poor join orders, as it cares about
> "upper bound" (~worst case) for the join orders.
>
> It goes beyond the scope of GOO, as the paper also talks about sampling
> to determine some of the estimates. But maybe it'd be useful (better
> than GEQO) even without that?
>
> I find the focus on the "worst case" (and only trusting baserel
> estimates) interesting. I wonder if it might help with the common
> nestloop problem, where we opt to believe a very optimistic low
> cardinality estimate, only to end with a sequence of nestloops that
> never complete.

Thanks for the pointer. I’ve also been looking at more recent work on
join ordering, and there are indeed several ideas that seem relevant to
the current direction, including the paper you mentioned. These are
definitely directions I plan to consider more seriously.

My current plan is to first establish a reasonably solid baseline using
relatively simple techniques—reducing extreme regressions and getting
overall plan quality closer to DP. Once that is in place, I’d like to
step back, summarize which recent papers and ideas seem most applicable,
and then experiment with more advanced approaches incrementally,
building on that foundation.

Thanks for all the detailed feedback and the many useful insights
throughout this discussion.

--
Best regards,
Chengpeng Yan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Japin Li 2025-12-28 04:12:16 回复: cleanup: Split long Makefile lists across lines and sort them
Previous Message Michael Paquier 2025-12-28 00:26:33 Re: cleanup: Split long Makefile lists across lines and sort them