Quick Links

hashjoins vs. bitmap filters

From:	Ben Mejia <benjamin(dot)arthur(dot)mejia(at)gmail(dot)com>
To:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc:	Tomas Vondra <tomas(at)vondra(dot)me>
Subject:	hashjoins vs. bitmap filters
Date:	2026-06-27 00:59:26
Message-ID:	b00303b8-ca45-40a6-9e50-5410a02437a1@gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

This is the bucket-occupancy bitmap filter I described on the
"hashjoins vs. Bloom filters (yet again)" thread [1]. Tomas suggested
posting the patch the patch I described in [2] in a separate thread for
cfbot, so here it is as a two-patch series.

Similar to Tomas's patch to use a Bloom filter in hashjoins to avoid
probes, I have these two patches that use a bitmap filter for the same
two cases. As Tomas points out, a bitmap filter can be viewed as a Bloom
filter with a single hash function.

The idea: while building the inner side, set one bit per occupied
hash bucket (one bit per bucket, sized directly by nbuckets -- ~4kB for
32k buckets). An empty bucket means no possible match, with no false
negatives. Compared to a Bloom filter it is less discriminating but far
cheaper to build and probe, and needs no sizing/false-positive tuning.

It exploits this in the two cases from the older Bloom-filter work:

0001 - pre-spill outer drop (multi-batch). During outer
partitioning, an outer tuple whose target batch's bucket is empty cannot
match, so it is dropped before being written to the batch file -- saving
the spill write and subsequent read. Inner and semi joins only. Also,
this currently disables itself when nbatch grows at runtime (batch
doubling), so it targets stable multi-batch plans rather than the
runaway-batch case. (See [3] for a discussion on dealing with runaway
batches.)

0002 - single-batch probe skip. While probing a single-batch hash
table, an empty bucket short-circuits the chain walk. Safe for any join
type (it only avoids work on a provably-empty bucket). Pays off mainly
with a high miss rate and a bucket array larger than L2/L3.

Both carry an adaptive runtime guard: they sample the drop/skip rate
over a window and stop consulting the bitmap if it falls below
break-even, so dense (e.g. FK) joins do not pay the per-tuple check for
nothing.

Rough numbers (serial join, 8-core x86-64, work_mem 1-8MB):

0001, by probe selectivity: ~95% miss: +18% to +36%
~37% miss: +9% to +13%
FK-like (~0% miss): flat
0002, single-batch high-miss: up to ~15% on cache-friendly data

Both are gated by GUCs (enable_hashjoin_prefilter,
enable_hashjoin_probe_filter), default off. They're experimental toggles
for now -- no config.sgml docs yet, pending design feedback. Serial hash
join only at this stage.

Some known issues: The naming needs a little work... prefilter and
probe filter are pretty generic and could use better names-- I just
can't think of new ones right now. Also, the memory used for the bitmaps
is not included or reported in spaceUsed for now.

[1]
https://www.postgresql.org/message-id/5cd8c20c-14b5-4b0d-bedc-69bf714e87eb@vondra.me
[2]
https://www.postgresql.org/message-id/2277c338-87ee-424c-a03c-4b6f589ccf26@gmail.com
[3]
https://www.postgresql.org/message-id/9a3604ae-b36d-492d-a74b-a3f57cdfd5ae@gmail.com

Regards,
Ben Mejia

Attachment	Content-Type	Size
v1-0001-Add-enable_hashjoin_prefilter-drop-non-matching-o.patch	text/plain	23.3 KB
v1-0002-Add-a-bitmap-filter-for-single-batch-prefiltering.patch	text/plain	16.0 KB

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Henson Choi	2026-06-27 01:11:34	Re: [PATCH] Fix replica identity mismatch for partitioned tables with publish_via_partition_root
Previous Message	Henson Choi	2026-06-27 00:23:45	Re: Return value of XLogInsertRecord() for XLOG_SWITCH record