Re: Add a greedy join search algorithm to handle large join problems

From: Tomas Vondra <tomas(at)vondra(dot)me>
To: Chengpeng Yan <chengpeng_yan(at)Outlook(dot)com>
Cc: "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, John Naylor <johncnaylorls(at)gmail(dot)com>
Subject: Re: Add a greedy join search algorithm to handle large join problems
Date: 2025-12-10 10:20:14
Message-ID: e82d4302-2450-4915-93a5-7df75f69c385@vondra.me
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/10/25 06:20, Chengpeng Yan wrote:
> Hi,
>
>> On Dec 10, 2025, at 07:30, Tomas Vondra <tomas(at)vondra(dot)me> wrote:
>>
>> I looked at a couple more failing queries, and removing the aggregates
>> fixes them too. Maybe there are other issues/crashes, of course.
>
> Thanks a lot for pointing this out. I also noticed the same issue when
> testing TPC-H Q5. The root cause was that in the goo algorithm I forgot
> to handle the case of eager aggregation. This has been fixed in the v2
> patch (after the fix, the v2 version works correctly for all TPC-H
> queries). I will continue testing it on TPC-DS as well.
>

I can confirm v2 makes it work for planning all 99 TPC-DS queries, i.e.
there are no more crashes during EXPLAIN.

Comparing the plans from master/geqo and master/goo, I see about 80% of
them changed. I haven't done any evaluation of how good the plans are,
really. I'll see if I can get some numbers next, but it'll take a while.

It's also tricky as plan choices depend on GUCs like random_page_cost,
and if those values are not good enough, the optimizer may still end up
with a bad plan. I'm not sure what's the best approach.

I did however notice an interesting thing - running EXPLAIN on the 99
queries (for 3 scales and 0/4 workers, so 6x 99) took this much time:

master: 8s
master/geqo: 20s
master/goo: 5s

Where master/geqo means

geqo_threshold=2

and master/goo means

geqo_threshold=2
enable_goo_join_search = on

It's nice that "goo" seems to be faster than "geqo" - assuming the plans
are comparable or better. But it surprised me switching to geqo makes it
slower than master. That goes against my intuition that geqo is meant to
be cheaper/faster join order planning. But maybe I'm missing something.

> Sorry that I didn’t push the related changes earlier. I was running more
> experiments on the greedy strategy, and I still needed some time to
> organize and analyze the results. During this process, I found that the
> current greedy strategy may lead to suboptimal plan quality in some
> cases. Because of that, I plan to first evaluate a few basic greedy
> heuristics on TPC-H to understand their behavior and limitations. If
> there are meaningful results or conclusions, I will share them for
> discussion.
>
> Based on some preliminary testing, I’m leaning toward keeping the greedy
> strategy simple and explainable, and focusing on the following
> indicators together with the planner’s cost estimates:
> * join cardinality (number of output rows)
> * selectivity (join_size / (left_size * right_size))
> * estimated result size in bytes(joinrel->reltarget->width * joinrel->rows)
> * cheapest total path cost (cheapest_total_path->total_cost)
>
> At the moment, I’m inclined to prioritize join cardinality with input
> size as a secondary factor, but I’d like to validate this step by step:
> first by testing these simple heuristics on TPC-H (as a relatively
> simple workload) and summarizing some initial conclusions there. After
> that, I plan to run more comprehensive experiments on more complex
> benchmarks such as JOB and TPC-DS and report the results.
>
> Do you have any thoughts or suggestions on this direction?
>
> Thanks again for your feedback and help.
>

No opinion.

IIUC it's a simplified heuristics, replacing the "full" join planning
algorithm. So inevitably there will be cases when it produces plans that
are "worse" than the actual join order planning. I don't have a great
intuition what's the right trade off yet. Or am I missing something?

regards

--
Tomas Vondra

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Shlok Kyal 2025-12-10 10:21:53 Re: Skipping schema changes in publication
Previous Message 高雪玉 2025-12-10 10:11:46 Re:Propose: Adding a '--enable-failover' option to 'pg_createsubscriber'