Re: BUG #19040: Memory leak in hashed subplan node due to missing hashtempcxt reset

From: feichanghong <feichanghong(at)qq(dot)com>
To: ocean_li_996 <ocean_li_996(at)163(dot)com>
Cc: "mohen(dot)lhy(at)alibaba-inc(dot)com" <mohen(dot)lhy(at)alibaba-inc(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>, jdavis(at)postgresql(dot)org
Subject: Re: BUG #19040: Memory leak in hashed subplan node due to missing hashtempcxt reset
Date: 2025-09-03 02:20:48
Message-ID: tencent_8E54DE49323C3A293EE44DFE385011189509@qq.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi,

> On Sep 3, 2025, at 09:43, ocean_li_996 <ocean_li_996(at)163(dot)com> wrote:
> I've attached 'v01_fix_memory_leak_in_hashed_subplan_node.patch' to address this.

It seems this issue has been around for many years. I took a quick
look at the patch for fixing it. Why don't we reset the temp context
in the LookupTupleHashEntry, TupleHashTableHash, LookupTupleHashEntryHash,
and FindTupleHashEntry functions? This seems more robust. Furthermore,
the added test case doesn't seem to detect whether there's a memory leak.

>
> ----- Original Message -----
> From: "PG Bug reporting form" <noreply(at)postgresql(dot)org>
> To: pgsql-bugs(at)lists(dot)postgresql(dot)org
> Cc: mohen(dot)lhy(at)alibaba-inc(dot)com
> Sent: Tue, 02 Sep 2025 15:58:49 +0000
> Subject: BUG #19040: Memory leak in hashed subplan node due to missing hashtempcxt reset
>
> The following bug has been logged on the website:
>
> Bug reference: 19040
> Logged by: haiyang li
> Email address: mohen(dot)lhy(at)alibaba-inc(dot)com
> PostgreSQL version: 18beta3
> Operating system: centos7 5.10.84 x86_64
> Description:
>
> Hello, all!
>
> I found a query which consumes a lot of memory and triggers OOM killer.
> Memory leak occurs in hashed subplan node.
>
> I was able to create reproducible test case on machine with default config
> and postgresql 18beta3:
>
> CREATE TABLE test1(
> a numeric,
> b int);
> INSERT INTO
> test1
> SELECT
> i,
> i
> FROM
> generate_series(1, 30000000) i; -- Make the running time longer
> EXPLAIN ANALYZE SELECT
> *
> FROM
> test1
> WHERE
> a NOT IN(
> SELECT
> i
> FROM
> generate_series(1, 10000) i
> );
>
> plan:
> QUERY PLAN
> ------------------------------------------------------------------------------------------------------------------------------------
> Seq Scan on test1 (cost=125.00..612432.24 rows=15000108 width=10) (actual
> time=135.191..25832.808 rows=29990000 loops=1)
> Filter: (NOT (hashed SubPlan 1))
> Rows Removed by Filter: 10000
> SubPlan 1
> -> Function Scan on generate_series i (cost=0.00..100.00 rows=10000
> width=4) (actual time=36.999..38.296 rows=10000 loops=1)
> Planning Time: 0.280 ms
> JIT:
> Functions: 15
> Options: Inlining true, Optimization true, Expressions true, Deforming
> true
> Timing: Generation 1.155 ms, Inlining 25.929 ms, Optimization 60.700 ms,
> Emission 23.018 ms, Total 110.802 ms
> Execution Time: 28217.026 ms
> (11 rows)
>
>
> I observed that the process's RES (resident memory) was increasing rapidly
> during SQL execution by using 'top -p <pid>' command.
>
> Furthermore, during SQL execution, I ran 'select
> pg_log_backend_memory_contexts(<pid>)'
> to print memory context statistics. The context with abnormally high memory
> usage was
> "Subplan HashTable Temp Context." The key part of the log is as follows:
>
> ...
> LOG: level: 5; Subplan HashTable Temp Context: 514834432 total in 62849
> blocks; 973712 free (60695 chunks); 513860720 used
> LOG: level: 5; Subplan HashTable Context: 524288 total in 7 blocks; 123808
> free (5 chunks); 400480 used
> ...
> Grand total: 518275344 bytes in 63116 blocks; 2025560 free (60976 chunks);
> 516249784 used
> ...
>
> If I change the SQL from "a NOT IN" to "b NOT IN" and do the same action, I
> can not
> observe abnormally high memory usage. Likewise, the key part of the log is
> as follows:
> ...
> LOG: level: 5; Subplan HashTable Temp Context: 1024 total in 1 blocks; 784
> free (0 chunks); 240 used
> LOG: level: 5; Subplan HashTable Context: 524288 total in 7 blocks; 123808
> free (5 chunks); 400480 used
> ...
> Grand total: 3441936 bytes in 268 blocks; 1050520 free (281 chunks); 2391416
> used
> ...
>
> While analyzing the source code, I found that the hashed subplan node fails
> to reset
> the 'hashtempcxt' context after probing the hash table for each slot.
>
> When variable-length datatypes (e.g., numeric) are processed, this can
> trigger calls
> to 'detoast_attr', which allocate memory in hashtempcxt. Without a reset,
> this memory
> is not reclaimed until the context itself is destroyed, resulting in a
> memory leak
> when processing large numbers of slots.

Additionally, through testing, I've found that if test1 is a partitioned
table, multiple "Subplan HashTable Context" instances exist, and these
MemoryContexts will only be released after execution is complete. If the
number of subpartitions is large, it can lead to significant memory
usage. Doesn't this differ from what the execution plan shows? The plan
only displays one occurrence of SubPlan.

Best Regards,
Fei Changhong

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Michael Paquier 2025-09-03 03:58:45 Re: Broken PQtrace CopyData display
Previous Message zhouenbing 2025-09-03 02:00:13 答复: empty,query_id, pg_stat_activity