Quick Links

Re: BUG #19030: Hash join leads to extremely high memory usage

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	moj(at)dshare(dot)de
Cc:	pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject:	Re: BUG #19030: Hash join leads to extremely high memory usage
Date:	2025-08-23 22:29:03
Message-ID:	51461.1755988143@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-bugs

PG Bug reporting form <noreply(at)postgresql(dot)org> writes:
> Description
> - Two tables, left join
> - The left table has significantly fewer rows than the right table
> - The left table has very large rows (many columns with high memory usage)
> - The left table has many null values in the join column
> - A hash join is used
> - The hash node is built from the left table
> - The query results in extremely high memory usage (100x work_mem in the
> example, > 1000x in real case)

If this is specific to the case of many null join values, it's a known
problem that I have a patch in the queue for [1].

On your example, I get this on HEAD:

QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Hash Right Join (cost=459884.00..1367112.00 rows=10000000 width=65) (actual time=2557.730..11727.429 rows=10000000.00 loops=1)
Hash Cond: (right_.c = left_.c)
Buffers: shared read=201216, temp read=131098 written=131098
-> Seq Scan on right_ (cost=0.00..283520.00 rows=20000000 width=8) (actual time=0.060..806.467 rows=20000000.00 loops=1)
Buffers: shared read=83520
-> Hash (cost=217696.00..217696.00 rows=10000000 width=65) (actual time=2494.680..2494.680 rows=10000000.00 loops=1)
Buckets: 131072 (originally 131072) Batches: 16384 (originally 256) Memory Usage: 440509kB
Buffers: shared read=117696, temp written=46737
-> Seq Scan on left_ (cost=0.00..217696.00 rows=10000000 width=65) (actual time=0.096..481.139 rows=10000000.00 loops=1)
Buffers: shared read=117696
Planning:
Buffers: shared hit=136 read=35
Memory: used=21kB allocated=32kB
Planning Time: 0.703 ms
Execution Time: 11946.100 ms
(15 rows)

and this with the aforesaid patch:

QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Hash Right Join (cost=459884.00..1367112.00 rows=10000000 width=65) (actual time=1553.342..6589.352 rows=10000000.00 loops=1)
Hash Cond: (right_.c = left_.c)
Buffers: shared hit=188 read=201028, temp read=128626 written=128626
-> Seq Scan on right_ (cost=0.00..283520.00 rows=20000000 width=8) (actual time=0.099..750.684 rows=20000000.00 loops=1)
Buffers: shared hit=94 read=83426
-> Hash (cost=217696.00..217696.00 rows=10000000 width=65) (actual time=1551.739..1551.740 rows=5000000.00 loops=1)
Buckets: 131072 Batches: 256 Memory Usage: 2906kB
Buffers: shared hit=94 read=117602, temp written=93662
-> Seq Scan on left_ (cost=0.00..217696.00 rows=10000000 width=65) (actual time=0.106..470.671 rows=10000000.00 loops=1)
Buffers: shared hit=94 read=117602
Planning:
Memory: used=20kB allocated=32kB
Planning Time: 0.122 ms
Execution Time: 6827.197 ms
(14 rows)

(Hmm, looking at this, it's apparent that the patch causes the
Hash node not to count the null-keyed rows in its EXPLAIN output.
On the one hand, that's an accurate reflection of how much went into
the hash table, but on the other hand it's pretty confusing.)

I doubt we'd consider back-patching such a change, but if you want
to see it happen for v19, you could help by reviewing/testing.

regards, tom lane

[1] https://www.postgresql.org/message-id/flat/3061845.1746486714%40sss.pgh.pa.us

In response to

BUG #19030: Hash join leads to extremely high memory usage at 2025-08-23 15:30:13 from PG Bug reporting form

Browse pgsql-bugs by date

	From	Date	Subject
Next Message	Amit Kapila	2025-08-25 03:51:02	Re: BUG #19029: Replication Slot size keeps increasing while logical subscription works fine
Previous Message	PG Bug reporting form	2025-08-23 15:30:13	BUG #19030: Hash join leads to extremely high memory usage