Quick Links

Re: hashjoins vs. Bloom filters (yet again)

From:	Tomas Vondra <tomas(at)vondra(dot)me>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: hashjoins vs. Bloom filters (yet again)
Date:	2026-07-01 13:25:25
Message-ID:	ef511890-33f2-4ec8-be95-7438ed183541@vondra.me
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

On 6/2/26 17:46, Tomas Vondra wrote:
> On 5/31/26 17:03, Andrew Dunstan wrote:
>>
>> ...
>
> OK. I think it's enough for testing, i.e. to see if it's actually worth
> pursuing further. But I think we'll eventually need to solve the issues
> planning/costing somehow. I'm not sure it'll be committable without
> having some sort of solution.
>
> I happened to find this 2025 paper:
>
> Including Bloom Filters in Bottom-up Optimization
> https://arxiv.org/html/2505.02994v1
>
> I read it over the weekend, and interestingly enough it's exactly about
> the planning issues I outlined last week, i.e. difficulties with costing
> paths that might include pushed-down Bloom filters.
>
> They even describe a solution that kinda looks a bit like "tracking a
> separate set of paths" from my e-mail, although they use somewhat
> different terminology (sub-plan == our path, etc.). But if you squint a
> little bit, it talks about the costing issue, path explosion, etc.
>
> Their solutions is some sort of two-phase process, which I'm not sure we
> can do. It'd require a fundamental rework of how we construct join rels
> and all that, and TBH I don't have an ambition to do that.
>
> But while reading the paper, I kept thinking about how we deal with
> pathkeys. I wonder if we could do something similar to that? That is,
> have a concept of "potentially interesting" filters, and construct the
> extra paths only for those, to limit the number of extra paths.
>
> Imagine we construct the the baserels (essentially scan nodes), and then
> do a pass over those. Each scan would look at what joins it participates
> in, and which of those could benefit from a Bloom filter (some can't,
> because it's a FK join, or we don't expect many rejected tuples, or
> maybe it's a LEFT JOIN, ... etc.).
>
> And then we'd maybe have some additional heuristics to pick which "Bloom
> filters" to attach to the path. And then later, when planning that
> particular join involving that path, we'd reject the join if it's not a
> hash join. The scan would always have to construct a "clean path" not
> requiring any filters, similarly to what custom scans need to do for
> parallel paths.
>
> It's just a rough idea, but I think it would work. Worth a try.
>

I kept thinking about how we could improve the v1 patch (with filter
pushdown to scans deep below the join), so that it does the decisions
"properly" during planning, i.e. during the bottom-up phase when
constructing paths. I kept going back to PathKeys as a precedent for
considering this kind of Path feature, so I gave that a try.

Attached is a v3, a PoC patch reworking the v1 patch in this way. The
patch looks pretty large (~300kB), but the vast majority of that is
churn in regression tests. And also a lot of FIXME/XXX comments
explaining remaining issues or opportunities for improvement. The actual
code changes are rather modest (maybe ~1000 lines).

It also looks much larger because I chose to include v1 as 0001, with
0002 reworking it. But 0002 undoes a lot of the changes made by 0001,
and the resulting diff is way *smaller* than either of the parts.

So it really is not that large in the end.

Note: Most of the changes in regression tests is due to plan changes,
where a Bloom filter gets pushed down, and maybe it even changes larger
plan changes (e.g. because it makes hashjoin cheaper, and so it wins).
There's also changes in results, because the ordering changes (which is
fine, if the plan switches to a hashjoin). I haven't investigated all
the individual changes in detail, but those that I looked at seem OK.

The v1 patch (still included as 0001) did the pushdown in createplan.c,
in create_hashjoin_plan. And that's problematic for the various reasons
I explained in the previous message (costing, missing better plans, ...)

The v2 patch (in 0002) reworks that to actually determine the filters
during the bottom-up phase.

In short:

(a) Path gets "expected_filters" list with filters it expects to be
satisfied by a later join.

(b) While creating paths for scans, we now also generate paths for
interesting filters. This happens in set_rel_pathlist(), it looks at
joins the relation participates in, decides which are selective enough,
and adds a couple paths for combinations of expected filters.

The default is to pick 3 selective filters, and generate all
combinations. So ~8 new paths for each scan path. There's a bunch of
open questions regarding which filters to pick, how many, which
combinations to create.

(c) While creating paths for joins, we also consider these additional
paths with expected filters. A join may either "satisfy" a filter (if
it's a hashjoin and the filter is pushed down by the join), or it can
propagate it (if it's pushed from a later join).

This means nestloop/mergejoin "discard" paths with filters matching that
join, but can still propagate the other filters.

The paths with filters don't compete with regular paths, so we still
have the regular paths with no filters. So there should be no risk of
not being able to find a plan, or something.

(d) At planning / execution this works similarly to v1, except that all
the decisions were already done. The code in createplan.c and executor
is more for book-keeping and allowing lookup of filters.

There's more details in the commit message and the various comments,
along with a bunch of XXX comments about needed improvements.

I haven't incorporated the two patches posted by Andrew:

1) making it work with CustomScans

2) supporting per-key filters

3) allow eager creation of filters (disable delayed Hash build)

I agree those seem like a worthwhile improvements, and the patches
seemed to be OK too, but I was focusing on reworking the planning. Based
on some off-list discussion, Andrew (or one of his colleagues) should be
able to adjust those for this v3 patch.

While I think this v3 approach is generally correct, there certainly is
enough issues to solve.

For example, I really dislike how generate_expected_filter_paths()
constructs the new paths by "cloning" the existing paths. That works,
but does not seem right - for example, the scan may be able to do
something smart with the filter (I presume this is why Andrew wanted the
per-key filters). In which case it probably would like to adjust the
costing accordingly. But with the memcpy() clone that's not possible. So
we'd need something better.

v3 also does not do much regarding the filter tradeoffs (larger filter
vs. higher false positive). Or even setting a memory limit.

In fact, I'm thinking it might be better to abstract the "filter" and
stop thinking just about Bloom filters. There probably are other kinds
of interesting filters, so maybe we should not assume all filters are
Bloom filters. Those filters probably won't be that different, it's more
about not putting "Bloom" in naming etc.

There's a lot more XXX comments, discussing all kinds of ideas (or stuff
that needs improving). I'm not going to repeat all of that here.

While working on the v3 changes, I've been looking for papers discussing
this sort of things. Because we're certainly not the only (or first)
database considering this. Aside from the paper I already mentioned in
my previous message:

Including Bloom Filters in Bottom-up Optimization
https://arxiv.org/html/2505.02994v1

I found two more related recent papers:

Predicate Transfer: Efficient Pre-Filtering on Multi-Join Queries
https://vldb.org/cidrdb/papers/2024/p22-yang.pdf

Debunking the Myth of Join Ordering: Toward Robust SQL Analytics
https://dl.acm.org/doi/pdf/10.1145/3725283

These papers talk about "predicate transafer", but that's really the
same thing as building and pushing down Bloom filters. Because that
effectively "transfers" the predicates (WHERE conditions) from the join
to the other part of the query.

I have not read those papers in detail yet, but I have a hunch they may
be answers to some of the open questions (e.g. a better heuristics how
to pick the interesting filters, etc.).

regards

--
Tomas Vondra

Attachment	Content-Type	Size
v3-0001-PoC-hashjoin-bloom-filter-pushdown.patch	text/x-patch	280.8 KB
v3-0002-PoC-Bloom-filter-pushdown-during-path-constructio.patch	text/x-patch	291.2 KB

In response to

Re: hashjoins vs. Bloom filters (yet again) at 2026-06-02 15:46:56 from Tomas Vondra

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	jian he	2026-07-01 13:44:37	Re: implement CAST(expr AS type FORMAT 'template')
Previous Message	Dilip Kumar	2026-07-01 13:25:23	Re: Proposal: Conflict log history table for Logical Replication