| From: | Tomas Vondra <tomas(at)vondra(dot)me> |
|---|---|
| To: | Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
| Cc: | Andrei Lepikhov <lepihov(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: Subquery pull-up increases jointree search space |
| Date: | 2026-06-05 10:43:03 |
| Message-ID: | 760a030e-ccbb-4584-b40e-2fe73c7827fe@vondra.me |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On 2/10/26 17:29, Robert Haas wrote:
> On Mon, Feb 9, 2026 at 3:17 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> What I'm wondering about is that join_collapse_limit and
>> from_collapse_limit were invented more than two decades ago, but
>> we've not touched their default values since then. Machines are a
>> lot faster since 2004, and we've probably achieved some net speedups
>> in the planner logic as well. Could we alleviate this concern by
>> raising those defaults, and if so, what are reasonable values in 2026?
>
> The problem as I see it is that the planning time growth is
> exponential, and so faster hardware doesn't necessarily buy you very
> much, especially given that we've added new planner techniques that
> add to the number of paths considered. But I also think that the
> degenerate cases are much worse than the typical cases. For example, I
> seem to remember things like A LEFT JOIN (B1 INNER JOIN B2 INNER JOIN
> B3...) LEFT JOIN (C1 INNER JOIN C2 INNER JOIN C3...) [repeat with D,
> E, etc.] being a problem, maybe for GEQO, because a
> randomly-determined join order isn't likely to be valid. I think there
> are similar issues with join_collapse_limit etc, for example because
> we prefer joins that have joinclauses over those that don't, so the
> actual planner work can be wildly different with the same number of
> joins in the query. I suspect the thing that we need in order to be
> able to safely raise these thresholds is for somebody to spend some
> time figuring out what the pathologically bad cases are and designing
> some sort of mitigations specifically for those. Or, alternatively, we
> could decide that we've been too pessimistic and set slightly riskier
> values by default, expecting that they'll work out most of the time
> and that users can lower the setting if there's an issue.
>
Sorry to revive this thread from February, but I've been wondering about
the same thing (possibility to increase join_collapse_limit) in the
context of the starjoin planning thread.
I've decided to do a simple stress-test experiment - generate random
joins of 2-N tables, and measure how long the planning takes. See the
attached python script - it's not super elaborate, the joins are normal
joins (no lateral, no subqueries, ...).
Attached is also a PDF with results. Each dot is one random join, the
charts on the right have log-scale y-axis. I think the conclusion is
increasing join_collapse_limit would require more work. Not just because
of the plan time, but (more importantly?) the memory needed.
There are significant variations, easily 2-3 orders of magnitude, and
it's getting worse with the number of tables. A join of 10 tables can
take 5-1000 ms.
The thing that surprised me a bit is how much memory the larger joins
may need for planning (measured by getrusage and log_planner_stats).
With 8 tables, the backend generally fits into 50MB. But then it
explodes (well, it's exponential, but here it starts to grow a lot), and
with 10 tables we may need 500MB, with 12 it's 5GB, ...
At 13 tables some of the joins start crashing because the machine only
has 64GB of RAM. That's why the chart "levels off" at ~50GB, because the
crashed queries are not included.
FWIW I don't think the joins are particularly nasty. I suspect these are
not the actual "worst" cases, we could probably construct worse ones.
I'm not sure about the plan time - maybe it got much better than it
used to be, either because we optimized some stuff and/or because CPUs
got better (not sure that's sufficient on it's own). But ISTM the memory
usage could be a worse of the two, because it does not make the query
just go slower, it can crash the instance.
So to allow increasing join_collapse_limit, we'd need to address this,
somehow. I have not investigated what exactly uses the memory. Some of
it may be easy to free, but I recall we have difficulties with freeing
some of the stuff because we don't know if it's still referenced.
FWIW I suspect the memory usage may be a significant contributing factor
for the plan time, because I'm seeing asm_exc_page_fault in the profiles
(which reminds me of [1]). But here mallopt/MALLOC_TOP_PAD_ can't help
with these amounts of memory. So lowering the memory usage might also
help with the plan time quite a bit.
regards
[1] https://vondra.me/posts/tuning-the-glibc-allocator-for-postgres/
--
Tomas Vondra
| Attachment | Content-Type | Size |
|---|---|---|
| join planning stats.pdf | application/pdf | 408.3 KB |
| join-script.tgz | application/x-compressed-tar | 1.3 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Dilip Kumar | 2026-06-05 10:51:56 | Re: Proposal: Conflict log history table for Logical Replication |
| Previous Message | vignesh C | 2026-06-05 10:09:49 | Re: [PATCH] Preserve replication origin OIDs in pg_upgrade |