Re: Add a greedy join search algorithm to handle large join problems

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Chengpeng Yan <chengpeng_yan(at)outlook(dot)com>
Cc: "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, John Naylor <johncnaylorls(at)gmail(dot)com>
Subject: Re: Add a greedy join search algorithm to handle large join problems
Date: 2025-12-02 05:36:50
Message-ID: CAFiTN-uOfES2p_ukrX+eTB+G+_K2zQ+_Y2tAXK2ffpbS-JDpzw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 2, 2025 at 9:18 AM Chengpeng Yan <chengpeng_yan(at)outlook(dot)com> wrote:
>
> Hi hackers,
>
> This patch implements GOO (Greedy Operator Ordering), a greedy
> join-order search method for large join problems, based on Fegaras (DEXA
> ’98) [1]. The algorithm repeatedly selects, among all legal joins, the
> join pair with the lowest estimated total cost, merges them, and
> continues until a single join remains. Patch attached.

Interesting.

> To get an initial sense of performance, I reused the star join /
> snowflake examples and the testing script from the thread in [2]. The
> star-join GUC in that SQL workload was replaced with
> `enable_goo_join_search`, so the same tests can run under DP (standard
> dynamic programming) / GEQO(Genetic Query Optimizer) / GOO. For these
> tests, geqo_threshold was set to 15 for DP, and to 5 for both GEQO and
> GOO. Other planner settings, including join_collapse_limit, remained at
> their defaults.
>
> On my local machine, a single-client pgbench run produces the following
> throughput (tps):
>
> | DP | GEQO | GOO
> --------------------+----------+----------+-----------
> starjoin (inner) | 1762.52 | 192.13 | 6168.89
> starjoin (outer) | 1683.92 | 173.90 | 5626.56
> snowflake (inner) | 1829.04 | 133.40 | 3929.57
> snowflake (outer) | 1397.93 | 99.65 | 3040.52
>

Is pgbench the right workload to test this, I mean what are we trying
to compare here the planning time taken by DP vs GEQO vs GOO or the
quality of the plans generated by different join ordering algorithms
or both? All pgbench queries are single table scans and there is no
involvement of the join search, so I am not sure how we can justify
these gains?

--
Regards,
Dilip Kumar
Google

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2025-12-02 05:52:21 Re: Inline non-SQL SRFs using SupportRequestSimplify
Previous Message Amit Kapila 2025-12-02 05:15:42 Re: Proposal: Conflict log history table for Logical Replication