Re: Add a greedy join search algorithm to handle large join problems

From: lakshmi <lakshmigcdac(at)gmail(dot)com>
To: Tomas Vondra <tomas(at)vondra(dot)me>
Cc: Chengpeng Yan <chengpeng_yan(at)outlook(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, John Naylor <johncnaylorls(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Add a greedy join search algorithm to handle large join problems
Date: 2026-02-16 11:26:13
Message-ID: CAEvyyTjuxiaL4_brvTsA3DZVcq5qgZS=yxFyNqoMtQWVZNiT1A@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Tomas,

Thank you for the clarification. You’re right that the synthetic test
doesn’t strongly constrain join order and, with the default
join_collapse_limit , the DP comparison was limited.

I’m now preparing follow-up tests using selected JOB queries and adjusted
planner settings to provide a fair and more realistic comparison. I’ll
share the results soon.

Regards,
Lakshmi

On Mon, Feb 16, 2026 at 4:27 PM Tomas Vondra <tomas(at)vondra(dot)me> wrote:

>
>
> On 2/16/26 11:44, lakshmi wrote:
> > Hi Tomas,
> >
> > Thank you for the question.
> > The 15-table and 20-table results I shared were obtained using a
> > synthetic join workload designed to stress join-order planning and
> > measure planning-time scaling, rather than a JOB or TPC-H query.
> > Each query is essentially a left-deep chain of equality joins over
> > simple tables. For reference, the structure is equivalent to:
> >
>
> Isn't the left-deep shape more a feature of the plan than the query?
> With all the joins using the same "id" attribute, there will be one huge
> equivalence class, so AFAICS this does not really force any particular
> order (and we could easily pick a bushy plan).
>
> >
> > 15-table join
> >
> > SELECT count(*)
> >
> > FROM t1
> >
> > JOIN t2 ON t1.id <http://t1.id> = t2.id <http://t2.id>
> >
> > JOIN t3 ON t2.id <http://t2.id> = t3.id <http://t3.id>
> >
> > ...
> >
> > JOIN t15 ON t14.id <http://t14.id> = t15.id <http://t15.id>;
> >
> >
> >
> >
> > 20-table join
> >
> >
> >
> > SELECT count(*)
> >
> > FROM t1
> >
> > JOIN t2 ON t1.id <http://t1.id> = t2.id <http://t2.id>
> >
> > JOIN t3 ON t2.id <http://t2.id> = t3.id <http://t3.id>
> >
> > ...
> >
> > JOIN t20 ON t19.id <http://t19.id> = t20.id <http://t20.id>;
> >
> >
> > Regarding planner settings:
> >
> >
> > -geqo_thresholdwas set to:
> >
> > a high value (e.g., 100) to force DP
> >
> > a low value (e.g., 2) to allow GEQO/GOO
> >
> >
> >
> > -enable_goo_join_searchwas toggled on/off depending on the comparison
> > being measured.
> >
> >
> >
> > -Other planner parameters, including join_collapse_limit, were left at
> > their default values.
> >
>
> Well, this means the DP problem was limited to 8. Is that a fair
> comparison for the other algorithms?
>
> >
> > So these experiments mainly evaluate planning-time scaling and basic
> > plan sanity on a controlled join graph, rather than realistic workload
> > plan quality.
> >
> > I’m currently preparing additional tests using selected JOB queries to
> > provide more meaningful plan-quality comparisons and will share those
> > results once available.
> >
>
> OK
>
>
> --
> Tomas Vondra
>
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jakub Wartak 2026-02-16 11:34:17 Re: 64-bit wait_event and introduction of 32-bit wait_event_arg
Previous Message Bertrand Drouvot 2026-02-16 11:17:32 Re: Fix uninitialized xl_running_xacts padding