Quick Links

Re: hashjoins vs. Bloom filters (yet again)

From:	Ben Mejia <benjamin(dot)arthur(dot)mejia(at)gmail(dot)com>
To:	Tomas Vondra <tomas(at)vondra(dot)me>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: hashjoins vs. Bloom filters (yet again)
Date:	2026-06-23 22:36:06
Message-ID:	2277c338-87ee-424c-a03c-4b6f589ccf26@gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 5/29/26 5:55 PM, Tomas Vondra wrote:

> old patches
> -----------
>
> Those old patches tried to do a fairly small thing during a hash join,
> and that's building a Bloom filter on the inner relation (the one that
> gets hashed), and then use that filter before probing the hash table.
>
> The benefits come from Bloom filters being (fairly) cheap, and a
> negative answer (hash is not in the filter) may allows us to skip a much
> more expensive operation.
>
> The old threads patches focused especially at two hash join cases:
>
> (a) A very selective join, i.e. a significant fraction of outer tuples
> does not have a match in the hash table.
>
> (b) A selective hash join forced to do batching because the hash table
> is too large, and thus forced to spill outer tuples to temporary files.
>
> For (a), the benefit comes from Bloom filters being much cheaper to
> probe than a hash table. The exact cost depends on the implementation,
> sizes, etc. We're in the ballpark of 50 vs. 500 cycles, maybe. But if
> the filter discards 90% of tuples, it can be a big win.
>
> For (b), the filter (for all the batches at once) allows us to discard
> some of the outer tuples without writing them to temporary files. Which
> is way more expensive than probing a hash table.
As it happens, I've been exploring the use of a bitmap filter for the
same two cases you mention. This has some relevance to the issues you
mention in your post about sizing, false-positive rate, etc.

Instead of a Bloom filter, I chose to use a bitmap filter, with one bit
per bucket on the build side. As the inner table is built, I set a bit
in the bitmap filter for every occupied bucket. If a bucket is empty,
there are no matching hashes and those hash values can be skipped where
appropriate. The advantages of this bitmap over a Bloom filter are:

- sizing is pre-determined by nbuckets
- small bitmaps (4k for 32k buckets)
- cheaper - nominal cost to set/check bits

A well-chosen Bloom filter will be more discriminating, but the bitmap
has the same no-false-negatives guarantee and costs much less space and
time to build.

I implemented both of your cases:

Drop-before-spill: (Case b)
Build per-batch bitmaps during inner partition pass and drop tuples that
don't have a bit set. Saves I/O on tuples that will never match. This
only works for inner and semi joins.

Single-Batch probe: (Case a)
Only pays off in high-miss-rate joins and a bucket array larger than
L2/L3 cache. This case has a higher penalty for hash table lookup than
the in-cache bitmap check. This case works in multi-batch, but the I/O
cost dominates and there is no gain.

I put runtime guards on both of these; I sampled the drop rate over a
window and disable the filter for the rest of the pass if the rate falls
below a threshold. (~5% for case b; ~25% for case a)

The benchmarks are encouraging:

For case a, I was able to see a best-case improvement of ~15% for
carefully chosen data (dependent on L2/L3 cache size).

For case b, I tested 3 cases with sparse, average and dense probe hits:

sparse probe (~95% miss): +18% to +36%
avg probe (~37% miss): +9% to +13%
dense probe (FK-like, ~0% miss): flat, within noise

(This was on a 8-core x86-64, L1 32KB/core, L2 4MB/core, L3 32MB, 31 GB
RAM. PostgreSQL 19devel, serial hash join,
max_parallel_workers_per_gather = 0, across work_mem = 1-8MB)

Happy to share the patch and full benchmark data if useful.

-Ben Mejia

In response to

hashjoins vs. Bloom filters (yet again) at 2026-05-30 00:55:43 from Tomas Vondra

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Bharath Rupireddy	2026-06-23 22:40:12	Re: Fix race condition in pg_get_publication_tables with concurrent DROP TABLE
Previous Message	Michael Paquier	2026-06-23 22:13:41	Re: [PATCH] Warn when io_min_workers exceeds io_max_workers