Re: Add a greedy join search algorithm to handle large join problems

From: John Naylor <johncnaylorls(at)gmail(dot)com>
To: Chengpeng Yan <chengpeng_yan(at)outlook(dot)com>
Cc: Tomas Vondra <tomas(at)vondra(dot)me>, lakshmi <lakshmigcdac(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Add a greedy join search algorithm to handle large join problems
Date: 2026-05-05 05:40:13
Message-ID: CANWCAZbbTazxGeMU=qdyi1kBr_Nkjv1n6vZR-hW30QbqVqkx1Q@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, May 3, 2026 at 7:46 PM Chengpeng Yan <chengpeng_yan(at)outlook(dot)com> wrote:
> > On Feb 14, 2026, at 13:39, Chengpeng Yan <chengpeng_yan(at)outlook(dot)com> wrote:
> > 1. Continue evaluating plan quality on more datasets/workloads. I’ve
> > already collected several candidate tests: some are JOB-based
> > variants, and others are synthetic workloads. Next, I plan to
> > consolidate these into a unified test set (with reproducible
> > setup/details), publish it, and run broader comparative evaluation.

> I ran the current pure-GOO variants on JOB and JOB-Complex [1], 143
> queries in total. JOB-Complex uses the same IMDB/JOB setting, but adds
> harder predicates and more challenging join patterns. The tested

Thanks for those results. As Tomas mentioned above, evaluation should
focus on cases where DP won't be used. Joins with a small number of
relations aren't going to tell us anything, especially since (I think)
GEGO will at times accidentally cover the entire seach space. Indeed,
there are quite a few queries where GOO is worse than GEQO by around
2-3x, but also small enough to be handled by DP anyway, so reporting
them is a distraction.

Less than half of the JOB-Complex queries have at least 12 "joins"
(BTW, is that number in the pdf actual joins or is it base
relations?), but the results (and summary results) include small joins
as well. Looking at the queries where one of GOO/GEQO is much better
than the other do seem to happen with large join problems.

> GOO(combined) with cost and result_size may already be a useful baseline
> or candidate generator, but the regressions above do not support making
> it the default GEQO replacement. One possible advantage over GEQO is
> that GOO may work better with pg_plan_advice, but I need to understand
> the details better before making a stronger claim.

Given that all heuristic join enumeration methods can produce
spectacularly bad plans, the ability to influence the plan is more
crucial with large join problems with small ones. Features should be
orthogonal in general, and in this case, integrating well with plan
advice seems like a strong deciding factor.

--
John Naylor
Amazon Web Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bertrand Drouvot 2026-05-05 06:00:23 Re: Fix DROP PROPERTY GRAPH "unsupported object class" error
Previous Message shveta malik 2026-05-05 05:35:41 Re: Include schema-qualified names in publication error messages.