Re: Subquery pull-up increases jointree search space

From: Andrei Lepikhov <lepihov(at)gmail(dot)com>
To: Tomas Vondra <tomas(at)vondra(dot)me>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Subquery pull-up increases jointree search space
Date: 2026-06-08 07:45:23
Message-ID: cdfa414e-2b7b-4921-a87b-2ab9b27d7bb6@gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 05/06/2026 12:43, Tomas Vondra wrote:
> On 2/10/26 17:29, Robert Haas wrote:
>> On Mon, Feb 9, 2026 at 3:17 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> What I'm wondering about is that join_collapse_limit and
>>> from_collapse_limit were invented more than two decades ago, but
>>> we've not touched their default values since then. Machines are a
>>> lot faster since 2004, and we've probably achieved some net speedups
>>> in the planner logic as well. Could we alleviate this concern by
>>> raising those defaults, and if so, what are reasonable values in 2026?
>>
>> The problem as I see it is that the planning time growth is
>> exponential, and so faster hardware doesn't necessarily buy you very
>> much, especially given that we've added new planner techniques that
>> add to the number of paths considered. But I also think that the
>> degenerate cases are much worse than the typical cases. For example, I
>> seem to remember things like A LEFT JOIN (B1 INNER JOIN B2 INNER JOIN
>> B3...) LEFT JOIN (C1 INNER JOIN C2 INNER JOIN C3...) [repeat with D,
>> E, etc.] being a problem, maybe for GEQO, because a
>> randomly-determined join order isn't likely to be valid. I think there
>> are similar issues with join_collapse_limit etc, for example because
>> we prefer joins that have joinclauses over those that don't, so the
>> actual planner work can be wildly different with the same number of
>> joins in the query. I suspect the thing that we need in order to be
>> able to safely raise these thresholds is for somebody to spend some
>> time figuring out what the pathologically bad cases are and designing
>> some sort of mitigations specifically for those. Or, alternatively, we
>> could decide that we've been too pessimistic and set slightly riskier
>> values by default, expecting that they'll work out most of the time
>> and that users can lower the setting if there's an issue.
>>
>
> Sorry to revive this thread from February, but I've been wondering about
> the same thing (possibility to increase join_collapse_limit) in the
> context of the starjoin planning thread.
>
> I've decided to do a simple stress-test experiment - generate random
> joins of 2-N tables, and measure how long the planning takes. See the
> attached python script - it's not super elaborate, the joins are normal
> joins (no lateral, no subqueries, ...).
Thanks.
This thread induced by the pain of the ORM world, where companies have a
stringent query response time. In such cases, query planning time is highly
important, and the number of joins varies widely (I'd say, unpredictably).

In my view, the purpose of a collapse limit in these cases is to keep planning
time reasonable and execution time steady. That’s why companies adjust these
GUCs carefully. Here’s a common example, typical for current widely used hardware:

from_collapse_limit = 20
join_collapse_limit = 20
geqo_threshold = 12

In this thread, I didn’t try to fix the issue by raising the collapse limit
because that doesn’t solve the problem. Instead, to make planning decisions more
stable, I suggest rearranging the jointree so the pulled-up jointree is closer
to the referenced outer subtree.

I’m not saying this approach should be added to the core, but it would be
helpful to have a way for extensions to influence rewriting decisions.

--
regards, Andrei Lepikhov,
pgEdge

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Ewan Young 2026-06-08 07:59:59 Re: [PATCH] Fix for bug #19474: LIKE fails to match literal backslashes with nondeterministic collations
Previous Message Pavel Stehule 2026-06-08 06:39:06 Re: bugfix - fix broken output in expanded aligned format, when data are too short