| From: | Ben Mejia <benjamin(dot)arthur(dot)mejia(at)gmail(dot)com> |
|---|---|
| To: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Cc: | Tomas Vondra <tomas(at)vondra(dot)me> |
| Subject: | hashjoins vs. bitmap filters |
| Date: | 2026-06-27 00:59:26 |
| Message-ID: | b00303b8-ca45-40a6-9e50-5410a02437a1@gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi,
This is the bucket-occupancy bitmap filter I described on the
"hashjoins vs. Bloom filters (yet again)" thread [1]. Tomas suggested
posting the patch the patch I described in [2] in a separate thread for
cfbot, so here it is as a two-patch series.
Similar to Tomas's patch to use a Bloom filter in hashjoins to avoid
probes, I have these two patches that use a bitmap filter for the same
two cases. As Tomas points out, a bitmap filter can be viewed as a Bloom
filter with a single hash function.
The idea: while building the inner side, set one bit per occupied
hash bucket (one bit per bucket, sized directly by nbuckets -- ~4kB for
32k buckets). An empty bucket means no possible match, with no false
negatives. Compared to a Bloom filter it is less discriminating but far
cheaper to build and probe, and needs no sizing/false-positive tuning.
It exploits this in the two cases from the older Bloom-filter work:
0001 - pre-spill outer drop (multi-batch). During outer
partitioning, an outer tuple whose target batch's bucket is empty cannot
match, so it is dropped before being written to the batch file -- saving
the spill write and subsequent read. Inner and semi joins only. Also,
this currently disables itself when nbatch grows at runtime (batch
doubling), so it targets stable multi-batch plans rather than the
runaway-batch case. (See [3] for a discussion on dealing with runaway
batches.)
0002 - single-batch probe skip. While probing a single-batch hash
table, an empty bucket short-circuits the chain walk. Safe for any join
type (it only avoids work on a provably-empty bucket). Pays off mainly
with a high miss rate and a bucket array larger than L2/L3.
Both carry an adaptive runtime guard: they sample the drop/skip rate
over a window and stop consulting the bitmap if it falls below
break-even, so dense (e.g. FK) joins do not pay the per-tuple check for
nothing.
Rough numbers (serial join, 8-core x86-64, work_mem 1-8MB):
0001, by probe selectivity: ~95% miss: +18% to +36%
~37% miss: +9% to +13%
FK-like (~0% miss): flat
0002, single-batch high-miss: up to ~15% on cache-friendly data
Both are gated by GUCs (enable_hashjoin_prefilter,
enable_hashjoin_probe_filter), default off. They're experimental toggles
for now -- no config.sgml docs yet, pending design feedback. Serial hash
join only at this stage.
Some known issues: The naming needs a little work... prefilter and
probe filter are pretty generic and could use better names-- I just
can't think of new ones right now. Also, the memory used for the bitmaps
is not included or reported in spaceUsed for now.
[1]
https://www.postgresql.org/message-id/5cd8c20c-14b5-4b0d-bedc-69bf714e87eb@vondra.me
[2]
https://www.postgresql.org/message-id/2277c338-87ee-424c-a03c-4b6f589ccf26@gmail.com
[3]
https://www.postgresql.org/message-id/9a3604ae-b36d-492d-a74b-a3f57cdfd5ae@gmail.com
Regards,
Ben Mejia
| Attachment | Content-Type | Size |
|---|---|---|
| v1-0001-Add-enable_hashjoin_prefilter-drop-non-matching-o.patch | text/plain | 23.3 KB |
| v1-0002-Add-a-bitmap-filter-for-single-batch-prefiltering.patch | text/plain | 16.0 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Henson Choi | 2026-06-27 01:11:34 | Re: [PATCH] Fix replica identity mismatch for partitioned tables with publish_via_partition_root |
| Previous Message | Henson Choi | 2026-06-27 00:23:45 | Re: Return value of XLogInsertRecord() for XLOG_SWITCH record |