Quick Links

Re: hashjoins vs. Bloom filters (yet again)

From:	"Matheus Alcantara" <matheusssilv97(at)gmail(dot)com>
To:	"Tomas Vondra" <tomas(at)vondra(dot)me>, "Andrew Dunstan" <andrew(at)dunslane(dot)net>, "PostgreSQL Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: hashjoins vs. Bloom filters (yet again)
Date:	2026-07-03 12:50:57
Message-ID:	DJOY6W2NBO0W.1MV2XB4QEU1ZD@gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Fri Jul 3, 2026 at 7:10 AM -03, Tomas Vondra wrote:
>> I'm wondering if the adaptive probing can mess the execution of the
>> choose plan. Let's say that a plan was chosen with bloom filters to
>> pushdown because it would reduce e.g 50% of the rows, what if at
>> runtime the bloom filter is proved to not be effective and it get
>> disabled making the scan node to produce 100% of the rows to the node
>> above that is not expecting? Do we end up with the same issue "expected
>> x actual rows"?
>>
>> I think that if we keep using the unnefective bloom filter the scan node
>> will still produce more rows than expected, but perhaps this is easier
>> to understand?
>>
>
> Yes. If we pick a filter expecting it to eliminate 99% of tuples, but
> then find it does not eliminate any tuples, that'll affect the counts in
> explain. But that's simply how misestimates work, it's not specific to
> filter pushdown. It may happen for WHERE clauses etc.
>
> The estimate would hit us at some point anyway - it's the selectivity of
> the join, so even without the pushdown the cardinality would be off
> above the join.
>
> Also, what else could we do? I don't think we can do planning assuming
> the estimates are off - we have to assume the estimates are OK. We might
> consider how "safe" a given estimate is, and then maybe not push down
> filters for "risky" ones, but we don't have any such capability.
>

Yeah, I was just worried about this may add more confusing, but after
more thinking it does not seems to be the case. I was imagining a
scenario where the expected x actual rows is not very accurate and it
could be hard to know if it was because outdated stats or because the
adaptive probing decided to turn off the filter pushdown but I think
that in the end it's all the same problem: The adaptive probing may turn
off the filter because it's not very effective but this can happen
because of outdated stats that make the planner think that the push down
filters would help but it's not.

>
>>> I haven't incorporated the two patches posted by Andrew:
>>>
>>> 1) making it work with CustomScans
>>>
>>> 2) supporting per-key filters
>>>
>>> 3) allow eager creation of filters (disable delayed Hash build)
>>>
>>> I agree those seem like a worthwhile improvements, and the patches
>>> seemed to be OK too, but I was focusing on reworking the planning. Based
>>> on some off-list discussion, Andrew (or one of his colleagues) should be
>>> able to adjust those for this v3 patch.
>>>
>>
>> I'm attaching a new v4 patchset incorporating Andrew patches with test
>> cases. 0001 and 0002 are your v3 untouched, 0003 is some tests added to
>> exercice the CustomScan path and 0004 is the Andrew changes with a few
>> changes required from v3 version:
>>
>> Unlike the v1 PoC that pushed filters down in create_hashjoin_plan
>> (where it could simply walk the finished plan tree and accept any scan
>> node), the filters are now decided during bottom-up path construction,
>> so a scan only receives a filter if a filter-bearing path was generated
>> for its base relation. So the main change is teaching path generation
>> about the custom scan.
>>
>
> Thanks, I'll take a look early next week. My plan is to create a branch
> carrying all the pieces, and then gradually refine that.
>

Good, thanks.

> One thing I'm a bit concerned about is the CustomScan support. We don't
> have a single module in the tree, so how would we test the changes? I
> think it might be necessary to introduce a minimal CustomScan module, so
> that we can test the new pieces.
>

On 0003 I've created a new test extension that register a CustomScan
method. It also expose some sql functions to assert some behaviours of
filter pushdown. I think that it's still on early stage but it may help.
Let me know what do you think.

--
Matheus Alcantara
EDB: https://www.enterprisedb.com

In response to

Re: hashjoins vs. Bloom filters (yet again) at 2026-07-03 10:10:33 from Tomas Vondra

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Rafia Sabih	2026-07-03 12:51:12	Re: Bypassing cursors in postgres_fdw to enable parallel plans
Previous Message	Tatsuo Ishii	2026-07-03 12:13:09	Re: Row pattern recognition