| From: | Tomas Vondra <tomas(at)vondra(dot)me> |
|---|---|
| To: | Oleg Bartunov <obartunov(at)postgrespro(dot)ru> |
| Cc: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: hashjoins vs. Bloom filters (yet again) |
| Date: | 2026-06-03 10:07:40 |
| Message-ID: | eb4c51c7-2758-447f-8e65-50f4ff0d36da@vondra.me |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On 6/3/26 11:20, Oleg Bartunov wrote:
> ...
>
> Bloom filters have two rather different roles here.
>
> For a local Hash Join optimization, Bloom does not require any
> particular physical ordering of the heap. It can be useful simply when
> the join is selective enough, or when batching/spilling makes failed
> probes expensive: the Bloom filter rejects many outer tuples before a
> full hash-table probe or before writing them to temporary batches.
>
Right. Adding a filter within a hash join is certainly less ambitious,
and the possible benefits are smaller.
> But once we talk about pushing a runtime filter down to the scan/storage
> layer, the physical preconditions become crucial. To get more than a
> cheap per-row check, the scan must have something coarse-grained to
> skip: partitions, row groups, chunks, block ranges, dictionaries, min/
> max metadata, BRIN-like summaries, etc. Without that, the filter is
> still correct, but the benefit is mostly CPU/probe reduction rather than
> avoiding data production.
>
Maybe, but there's also ongoing work on adding batches to the executor,
in which case we'd eliminate "row groups" even when using a filter in
the scope of a hashjoin operator. Of course, the tuples will flow all
the way up to that operator.
> So for me the most interesting part of this thread is not Bloom itself,
> but the architectural idea: pushing runtime knowledge down to the scan
> node, against the normal direction of data flow. The build side of a
> join produces compact knowledge about admissible keys, and lower layers
> may use it before rows are materialized and sent upward.
>
> I saw this in my own experiments with zone/chunk-oriented storage for
> Postgres: static predicates could prune zones nicely, but joins were the
> hard case because the useful filtering knowledge was produced above the
> scan. A runtime semi-join filter pushed from the Hash Join build side
> into the scan could turn join-derived knowledge into scan-level pruning.
>
> For example:
>
> SELECT sum(e.cost)
> FROM events e
> JOIN accounts a ON e.account_id = a.id <http://a.id>
> WHERE a.region = 'NP'; -- Nepal
>
> The events scan does not know which account_id values are EU accounts.
> That knowledge is produced above it, on the build side of the join. A
> runtime semi-join filter pushed from the Hash Join build side down into
> the events scan could let the scan reject impossible account_id values
> before producing tuples.
>
Yes. This is known as "predicate transfer" in academic papers.
> For a plain heap scan this may mostly save hash probes. But with zone/
> chunk-oriented storage, where chunks have dictionaries, min/max
> metadata, Bloom summaries, or tenant ranges, the same runtime filter can
> skip whole chunks. That is the part I find most interesting: turning
> join-derived knowledge into scan-level pruning, against the normal
> direction of data flow.
>
> Bloom is just one carrier for that knowledge. The real feature is a
> pluggable runtime-filter mechanism that heap, CustomScan, FDW, columnar/
> table AMs, partitioned storage, or chunk/cold storage can consume at the
> level they understand.
>
> This may be a topic for a separate thread, because it quickly becomes
> less about Hash Join Bloom filters and more about runtime knowledge
> pushdown into storage.
>
Right, there's a general concept of a "filter", and Bloom filters are
just one example of that. And maybe we could build other types of
filters more suitable for the scan. But I think it'll still be tied to a
hash join, because what other nodes / joins can build the filter?
regards
--
Tomas Vondra
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Ilmar Yunusov | 2026-06-03 10:22:19 | Re: [PATCH] Prevent repeated deadlock-check signals in standby buffer pin waits |
| Previous Message | Nisha Moond | 2026-06-03 09:54:52 | Re: Proposal: Conflict log history table for Logical Replication |