Re: Add a greedy join search algorithm to handle large join problems

From: Tomas Vondra <tomas(at)vondra(dot)me>
To: Chengpeng Yan <chengpeng_yan(at)outlook(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc: John Naylor <johncnaylorls(at)gmail(dot)com>
Subject: Re: Add a greedy join search algorithm to handle large join problems
Date: 2025-12-02 10:56:13
Message-ID: 6db6d2ec-7529-4add-9a95-178fc318311d@vondra.me
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/2/25 04:48, Chengpeng Yan wrote:
> Hi hackers,
>
> This patch implements GOO (Greedy Operator Ordering), a greedy
> join-order search method for large join problems, based on Fegaras (DEXA
> ’98) [1]. The algorithm repeatedly selects, among all legal joins, the
> join pair with the lowest estimated total cost, merges them, and
> continues until a single join remains. Patch attached.
>
> To get an initial sense of performance, I reused the star join /
> snowflake examples and the testing script from the thread in [2]. The
> star-join GUC in that SQL workload was replaced with
> `enable_goo_join_search`, so the same tests can run under DP (standard
> dynamic programming) / GEQO(Genetic Query Optimizer) / GOO. For these
> tests, geqo_threshold was set to 15 for DP, and to 5 for both GEQO and
> GOO. Other planner settings, including join_collapse_limit, remained at
> their defaults.
>
> On my local machine, a single-client pgbench run produces the following
> throughput (tps):
>
>                     |    DP    |   GEQO   |    GOO
> --------------------+----------+----------+-----------
> starjoin    (inner) |  1762.52 |  192.13  |  6168.89
> starjoin    (outer) |  1683.92 |  173.90  |  5626.56
> snowflake   (inner) |  1829.04 |  133.40  |  3929.57
> snowflake   (outer) |  1397.93 |   99.65  |  3040.52
>

Seems interesting, and also much more ambitious than what I intended to
do in the starjoin thread (which is meant to be just a simplistic
heuristics on top of the regular join order planning).

I think a much broader evaluation will be needed, comparing not just the
planning time, but also the quality of the final plan. Which for the
starjoin tests does not really matter, as the plans are all equal in
this regard.

regards

--
Tomas Vondra

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Matthias van de Meent 2025-12-02 11:12:17 Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements
Previous Message Matthias van de Meent 2025-12-02 10:51:06 Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements