Re: hashjoins vs. Bloom filters (yet again)

From: Tomas Vondra <tomas(at)vondra(dot)me>
To: Oleg Bartunov <obartunov(at)postgrespro(dot)ru>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: hashjoins vs. Bloom filters (yet again)
Date: 2026-06-03 10:07:40
Message-ID: eb4c51c7-2758-447f-8e65-50f4ff0d36da@vondra.me
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 6/3/26 11:20, Oleg Bartunov wrote:
> ...
>
> Bloom filters have two rather different roles here.
>
> For a local Hash Join optimization, Bloom does not require any
> particular physical ordering of the heap. It can be useful simply when
> the join is selective enough, or when batching/spilling makes failed
> probes expensive: the Bloom filter rejects many outer tuples before a
> full hash-table probe or before writing them to temporary batches.
>

Right. Adding a filter within a hash join is certainly less ambitious,
and the possible benefits are smaller.

> But once we talk about pushing a runtime filter down to the scan/storage
> layer, the physical preconditions become crucial. To get more than a
> cheap per-row check, the scan must have something coarse-grained to
> skip: partitions, row groups, chunks, block ranges, dictionaries, min/
> max metadata, BRIN-like summaries, etc. Without that, the filter is
> still correct, but the benefit is mostly CPU/probe reduction rather than
> avoiding data production.
>

Maybe, but there's also ongoing work on adding batches to the executor,
in which case we'd eliminate "row groups" even when using a filter in
the scope of a hashjoin operator. Of course, the tuples will flow all
the way up to that operator.

> So for me the most interesting part of this thread is not Bloom itself,
> but the architectural idea: pushing runtime knowledge down to the scan
> node, against the normal direction of data flow. The build side of a
> join produces compact knowledge about admissible keys, and lower layers
> may use it before rows are materialized and sent upward.
>
> I saw this in my own experiments with zone/chunk-oriented storage for
> Postgres: static predicates could prune zones nicely, but joins were the
> hard case because the useful filtering knowledge was produced above the
> scan. A runtime semi-join filter pushed from the Hash Join build side
> into the scan could turn join-derived knowledge into scan-level pruning.
>
> For example:
>
>   SELECT sum(e.cost)
>   FROM events e
>   JOIN accounts a ON e.account_id = a.id <http://a.id>
>   WHERE a.region = 'NP'; -- Nepal
>
> The events scan does not know which account_id values are EU accounts.
> That knowledge is produced above it, on the build side of the join. A
> runtime semi-join filter pushed from the Hash Join build side down into
> the events scan could let the scan reject impossible account_id values
> before producing tuples.
>

Yes. This is known as "predicate transfer" in academic papers.

> For a plain heap scan this may mostly save hash probes. But with zone/
> chunk-oriented storage, where chunks have dictionaries, min/max
> metadata, Bloom summaries, or tenant ranges, the same runtime filter can
> skip whole chunks. That is the part I find most interesting: turning
> join-derived knowledge into scan-level pruning, against the normal
> direction of data flow.
>
> Bloom is just one carrier for that knowledge. The real feature is a
> pluggable runtime-filter mechanism that heap, CustomScan, FDW, columnar/
> table AMs, partitioned storage, or chunk/cold storage can consume at the
> level they understand.
>
> This may be a topic for a separate thread, because it quickly becomes
> less about Hash Join Bloom filters and more about runtime knowledge
> pushdown into storage.
>

Right, there's a general concept of a "filter", and Bloom filters are
just one example of that. And maybe we could build other types of
filters more suitable for the scan. But I think it'll still be tied to a
hash join, because what other nodes / joins can build the filter?

regards

--
Tomas Vondra

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Ilmar Yunusov 2026-06-03 10:22:19 Re: [PATCH] Prevent repeated deadlock-check signals in standby buffer pin waits
Previous Message Nisha Moond 2026-06-03 09:54:52 Re: Proposal: Conflict log history table for Logical Replication