| From: | Ben Mejia <benjamin(dot)arthur(dot)mejia(at)gmail(dot)com> |
|---|---|
| To: | Tomas Vondra <tomas(at)vondra(dot)me>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: hashjoins vs. Bloom filters (yet again) |
| Date: | 2026-06-23 22:36:06 |
| Message-ID: | 2277c338-87ee-424c-a03c-4b6f589ccf26@gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On 5/29/26 5:55 PM, Tomas Vondra wrote:
> old patches
> -----------
>
> Those old patches tried to do a fairly small thing during a hash join,
> and that's building a Bloom filter on the inner relation (the one that
> gets hashed), and then use that filter before probing the hash table.
>
> The benefits come from Bloom filters being (fairly) cheap, and a
> negative answer (hash is not in the filter) may allows us to skip a much
> more expensive operation.
>
> The old threads patches focused especially at two hash join cases:
>
> (a) A very selective join, i.e. a significant fraction of outer tuples
> does not have a match in the hash table.
>
> (b) A selective hash join forced to do batching because the hash table
> is too large, and thus forced to spill outer tuples to temporary files.
>
> For (a), the benefit comes from Bloom filters being much cheaper to
> probe than a hash table. The exact cost depends on the implementation,
> sizes, etc. We're in the ballpark of 50 vs. 500 cycles, maybe. But if
> the filter discards 90% of tuples, it can be a big win.
>
> For (b), the filter (for all the batches at once) allows us to discard
> some of the outer tuples without writing them to temporary files. Which
> is way more expensive than probing a hash table.
As it happens, I've been exploring the use of a bitmap filter for the
same two cases you mention. This has some relevance to the issues you
mention in your post about sizing, false-positive rate, etc.
Instead of a Bloom filter, I chose to use a bitmap filter, with one bit
per bucket on the build side. As the inner table is built, I set a bit
in the bitmap filter for every occupied bucket. If a bucket is empty,
there are no matching hashes and those hash values can be skipped where
appropriate. The advantages of this bitmap over a Bloom filter are:
- sizing is pre-determined by nbuckets
- small bitmaps (4k for 32k buckets)
- cheaper - nominal cost to set/check bits
A well-chosen Bloom filter will be more discriminating, but the bitmap
has the same no-false-negatives guarantee and costs much less space and
time to build.
I implemented both of your cases:
Drop-before-spill: (Case b)
Build per-batch bitmaps during inner partition pass and drop tuples that
don't have a bit set. Saves I/O on tuples that will never match. This
only works for inner and semi joins.
Single-Batch probe: (Case a)
Only pays off in high-miss-rate joins and a bucket array larger than
L2/L3 cache. This case has a higher penalty for hash table lookup than
the in-cache bitmap check. This case works in multi-batch, but the I/O
cost dominates and there is no gain.
I put runtime guards on both of these; I sampled the drop rate over a
window and disable the filter for the rest of the pass if the rate falls
below a threshold. (~5% for case b; ~25% for case a)
The benchmarks are encouraging:
For case a, I was able to see a best-case improvement of ~15% for
carefully chosen data (dependent on L2/L3 cache size).
For case b, I tested 3 cases with sparse, average and dense probe hits:
sparse probe (~95% miss): +18% to +36%
avg probe (~37% miss): +9% to +13%
dense probe (FK-like, ~0% miss): flat, within noise
(This was on a 8-core x86-64, L1 32KB/core, L2 4MB/core, L3 32MB, 31 GB
RAM. PostgreSQL 19devel, serial hash join,
max_parallel_workers_per_gather = 0, across work_mem = 1-8MB)
Happy to share the patch and full benchmark data if useful.
-Ben Mejia
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Bharath Rupireddy | 2026-06-23 22:40:12 | Re: Fix race condition in pg_get_publication_tables with concurrent DROP TABLE |
| Previous Message | Michael Paquier | 2026-06-23 22:13:41 | Re: [PATCH] Warn when io_min_workers exceeds io_max_workers |