| From: | Chengpeng Yan <chengpeng_yan(at)outlook(dot)com> |
|---|---|
| To: | "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Cc: | John Naylor <johncnaylorls(at)gmail(dot)com> |
| Subject: | Add a greedy join search algorithm to handle large join problems |
| Date: | 2025-12-02 03:48:26 |
| Message-ID: | 3FF63E99-AB4F-41A9-BC78-AAB28823FBD0@Outlook.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi hackers,
This patch implements GOO (Greedy Operator Ordering), a greedy
join-order search method for large join problems, based on Fegaras (DEXA
’98) [1]. The algorithm repeatedly selects, among all legal joins, the
join pair with the lowest estimated total cost, merges them, and
continues until a single join remains. Patch attached.
To get an initial sense of performance, I reused the star join /
snowflake examples and the testing script from the thread in [2]. The
star-join GUC in that SQL workload was replaced with
`enable_goo_join_search`, so the same tests can run under DP (standard
dynamic programming) / GEQO(Genetic Query Optimizer) / GOO. For these
tests, geqo_threshold was set to 15 for DP, and to 5 for both GEQO and
GOO. Other planner settings, including join_collapse_limit, remained at
their defaults.
On my local machine, a single-client pgbench run produces the following
throughput (tps):
| DP | GEQO | GOO
--------------------+----------+----------+-----------
starjoin (inner) | 1762.52 | 192.13 | 6168.89
starjoin (outer) | 1683.92 | 173.90 | 5626.56
snowflake (inner) | 1829.04 | 133.40 | 3929.57
snowflake (outer) | 1397.93 | 99.65 | 3040.52
Open questions:
* The paper ranks joins by estimated result size, while this prototype
reuses the existing planner total_cost. This makes the implementation
straightforward, but it may be worth exploring row-based rule.
* The prototype allocates through multiple memory contexts, aiming to
reduce memory usage during candidate evaluation. Suggestions on
simplification are welcome.
* Planning performance vs GEQO: On the star/snowflake benchmarks above,
GOO reduces planning time significantly compared to GEQO. Broader
evaluation on more representative workloads (e.g., TPC-H / TPC-DS /
JOB) is TODO, covering both planning speed and plan quality.
* There is no tuning knob like `geqo_seed` for GEQO if GOO produces a
poor ordering.
* The long-term integration is open:
* fully replace GEQO,
* keep GEQO available and optionally try both,
* or use GOO as a starting point for GEQO.
Comments and review would be appreciated.
References:
[1] Leonidas Fegaras, “A New Heuristic for Optimizing Large Queries”,
DEXA ’98. ACM Digital Library:
https://dl.acm.org/doi/10.5555/648311.754892 A publicly accessible
PostScript version is available from the author's page:
https://lambda.uta.edu/order.ps
[2] Star/snowflake join thread and benchmarks:
https://www.postgresql.org/message-id/a22ec6e0-92ae-43e7-85c1-587df2a65f51%40vondra.me
--
Best regards,
Chengpeng Yan
| Attachment | Content-Type | Size |
|---|---|---|
| v1-0001-Add-GOO-Greedy-Operator-Ordering-join-search-as-a.patch | application/octet-stream | 60.4 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Kirill Reshke | 2025-12-02 04:02:34 | Re: freespace buffer locking |
| Previous Message | Kirill Reshke | 2025-12-02 03:32:44 | Re: [PATCH] Add hint for misspelled relations |