Re:BUG #19040: Memory leak in hashed subplan node due to missing hashtempcxt reset

From: ocean_li_996 <ocean_li_996(at)163(dot)com>
To: "mohen(dot)lhy(at)alibaba-inc(dot)com " <mohen(dot)lhy(at)alibaba-inc(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org " <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Cc: feichanghong(at)qq(dot)com, jdavis(at)postgresql(dot)org
Subject: Re:BUG #19040: Memory leak in hashed subplan node due to missing hashtempcxt reset
Date: 2025-09-03 01:43:12
Message-ID: 58062871.112.1990d3e06ea.Coremail.ocean_li_996@163.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

For reasons I'm not yet aware of, my email reply to the bug report
did not show up in the discussion thread. I am therefore resending
the patch in a new email.

I've attached 'v01_fix_memory_leak_in_hashed_subplan_node.patch' to address this.

--
Thanks,
Haiyang Li

----- Original Message -----
From: "PG Bug reporting form" <noreply(at)postgresql(dot)org>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: mohen(dot)lhy(at)alibaba-inc(dot)com
Sent: Tue, 02 Sep 2025 15:58:49 +0000
Subject: BUG #19040: Memory leak in hashed subplan node due to missing hashtempcxt reset

The following bug has been logged on the website:

Bug reference: 19040
Logged by: haiyang li
Email address: mohen(dot)lhy(at)alibaba-inc(dot)com
PostgreSQL version: 18beta3
Operating system: centos7 5.10.84 x86_64
Description:

Hello, all!

I found a query which consumes a lot of memory and triggers OOM killer.
Memory leak occurs in hashed subplan node.

I was able to create reproducible test case on machine with default config
and postgresql 18beta3:

CREATE TABLE test1(
a numeric,
b int);
INSERT INTO
test1
SELECT
i,
i
FROM
generate_series(1, 30000000) i; -- Make the running time longer
EXPLAIN ANALYZE SELECT
*
FROM
test1
WHERE
a NOT IN(
SELECT
i
FROM
generate_series(1, 10000) i
);

plan:
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on test1 (cost=125.00..612432.24 rows=15000108 width=10) (actual
time=135.191..25832.808 rows=29990000 loops=1)
Filter: (NOT (hashed SubPlan 1))
Rows Removed by Filter: 10000
SubPlan 1
-> Function Scan on generate_series i (cost=0.00..100.00 rows=10000
width=4) (actual time=36.999..38.296 rows=10000 loops=1)
Planning Time: 0.280 ms
JIT:
Functions: 15
Options: Inlining true, Optimization true, Expressions true, Deforming
true
Timing: Generation 1.155 ms, Inlining 25.929 ms, Optimization 60.700 ms,
Emission 23.018 ms, Total 110.802 ms
Execution Time: 28217.026 ms
(11 rows)

I observed that the process's RES (resident memory) was increasing rapidly
during SQL execution by using 'top -p <pid>' command.

Furthermore, during SQL execution, I ran 'select
pg_log_backend_memory_contexts(<pid>)'
to print memory context statistics. The context with abnormally high memory
usage was
"Subplan HashTable Temp Context." The key part of the log is as follows:

...
LOG: level: 5; Subplan HashTable Temp Context: 514834432 total in 62849
blocks; 973712 free (60695 chunks); 513860720 used
LOG: level: 5; Subplan HashTable Context: 524288 total in 7 blocks; 123808
free (5 chunks); 400480 used
...
Grand total: 518275344 bytes in 63116 blocks; 2025560 free (60976 chunks);
516249784 used
...

If I change the SQL from "a NOT IN" to "b NOT IN" and do the same action, I
can not
observe abnormally high memory usage. Likewise, the key part of the log is
as follows:
...
LOG: level: 5; Subplan HashTable Temp Context: 1024 total in 1 blocks; 784
free (0 chunks); 240 used
LOG: level: 5; Subplan HashTable Context: 524288 total in 7 blocks; 123808
free (5 chunks); 400480 used
...
Grand total: 3441936 bytes in 268 blocks; 1050520 free (281 chunks); 2391416
used
...

While analyzing the source code, I found that the hashed subplan node fails
to reset
the 'hashtempcxt' context after probing the hash table for each slot.

When variable-length datatypes (e.g., numeric) are processed, this can
trigger calls
to 'detoast_attr', which allocate memory in hashtempcxt. Without a reset,
this memory
is not reclaimed until the context itself is destroyed, resulting in a
memory leak
when processing large numbers of slots.

A patch implementing this fix will be included in the follow-up email.

--
Thanks,
Haiyang Li

Attachment Content-Type Size
v01_fix_memory_leak_in_hashed_subplan_node.patch application/octet-stream 4.2 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message zhouenbing 2025-09-03 02:00:13 答复: empty,query_id, pg_stat_activity
Previous Message Tom Lane 2025-09-03 01:19:15 Re: BUG #19037: Planner fails on estimating array length with "no relation entry" error