Re: Subquery pull-up increases jointree search space

From: Tomas Vondra <tomas(at)vondra(dot)me>
To: Andrei Lepikhov <lepihov(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Subquery pull-up increases jointree search space
Date: 2026-06-08 15:27:13
Message-ID: 7008398d-1e85-4843-9a64-8bc53e875cab@vondra.me
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 6/8/26 15:32, Andrei Lepikhov wrote:
> On 07/06/2026 20:10, Tomas Vondra wrote:
>> On 6/5/26 12:43, Tomas Vondra wrote:
>> After classifying the allocations a bit, I see this:
>>
>> allocation | size
>> --------------------------------------+----------------
>> find_mergeclauses_for_outer_pathkeys | 671 MB
>> generate_join_implied_equalities | 660 MB
>> add_paths_to_joinrel | 242 MB
>> make_inner_pathkeys_for_merge | 195 MB
>> estimate_num_groups | 92 MB
>> get_joinrel_parampathinfo | 90 MB
>> create_memoize_path | 48 MB
>> | 15 MB
>> calc_nestloop_required_outer | 5226 kB
>> (9 rows)
>
> It is fun to use the benchmark script from the previous research on this topic
> [1], conducted in 2009. Query, consuming a lot of memory and planning time [2],
> now shows the following numbers:
>
> join_collapse_limit = 12: Memory: used=0.8GB, Time: 2.5s
> join_collapse_limit = 14: Memory: used=1.1GB, Time: 5.5s
> join_collapse_limit = 18: Memory: used=9.3GB, Time: 50.6s
>
> It's interesting to compare these results with your memory consumption profile.
> I tried using the memtrace patch and the Python script directly, but they gave
> unusual output on my machine, so I couldn't rely on them without more detailed
> instructions.
>

Yeah, the memtrace patch is very hacky, and I haven't given enough
instructions how to use it. The .patch adds elog(LOG) with the MEMTRACE
information (context, action, chunk, size), and the memtrace.py
aggregates it like this:

grep MEMTRACE pg.log | ./memtrace.py > memtrace.log

But the heaptrack seems more convenient, of course.

> Instead, I used the heaptrack tool, which gave me the following
profiles:
> bytes pct function
> collapse limit = 14:
> 354.48 MB 29.3% get_relation_foreign_keys
> 109.07 MB 9.0% generate_join_implied_equalities_normal
> 75.51 MB 6.2% get_eclass_indexes_for_relids
> 67.12 MB 5.5% find_mergeclauses_for_outer_pathkeys
> 67.12 MB 5.5% create_nestloop_path
> 58.73 MB 4.9% make_inner_pathkeys_for_merge
> 38.27 MB 3.2% expression_tree_mutator_impl
> 33.56 MB 2.8% hash_inner_and_outer
> collapse limit = 18:
> 1.23 GB 12.8% generate_join_implied_equalities_normal
> 1.11 GB 11.5% get_joinrel_parampathinfo
> 989.90 MB 10.3% get_eclass_indexes_for_relids
> 755.04 MB 7.9% find_mergeclauses_for_outer_pathkeys
> 721.44 MB 7.5% calc_nestloop_required_outer
> 721.43 MB 7.5% get_param_path_clause_serials
> 629.15 MB 6.5% bms_intersect
> 578.81 MB 6.0% have_unsafe_outer_join_ref
> 411.08 MB 4.3% make_inner_pathkeys_for_merge
> 354.48 MB 3.7% get_relation_foreign_keys
> 335.58 MB 3.5% create_nestloop_path
> 301.99 MB 3.1% generate_join_implied_equalities
>

Hmm, how does heaptrack deal with out memory pools? I was worried
existing memory profilers (like heaptrack) would get confused by our
memory contexts, attributing the whole block to the palloc that just
happens to allocate a new block. But that's not really right.

I see the heaptrack README claims it can work with memory pools after
annotating the code in some way. But there's not much details about
that. Also, it suggests valgrind/massif can already do that.

> Overall, the results for generate_join_implied_equalities and
> find_mergeclauses_for_outer_pathkeys are consistent. This test also highlights
> other sources of memory allocations, such as parameterised paths. The memory
> profile changes as the number of joins in the 'join problem' increases.
>

Yes. I did actually mention get_joinrel_parampathinfo.

regards

--
Tomas Vondra

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message ZizhuanLiu X-MAN 2026-06-08 15:31:21 Re: [PATCH] Doc: Mention OFF as an alias for EXPLAIN SERIALIZE NONE
Previous Message Andres Freund 2026-06-08 15:10:31 Re: Upload only the failed tests logs to the Postgres CI (Cirrus CI)