Quick Links

Re: WIP: bloom filter in Hash Joins with batches

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP: bloom filter in Hash Joins with batches
Date:	2015-12-17 10:44:56
Message-ID:	CANP8+jKT=Vzv92mSv1Lh2tmeHyhmNBfa5xG6r1msgBT5QDf1Aw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 15 December 2015 at 22:30, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
wrote:

3) Currently the bloom filter is used whenever we do batching, but it
> should really be driven by selectivity too - it'd be good to (a)
> estimate the fraction of 'fact' tuples having a match in the hash
> table, and not to do bloom if it's over ~60% or so. Also, maybe
> the could should count the matches at runtime, and disable the
> bloom filter if we reach some threshold.
>

Cool results.

It seems a good idea to build the bloom filter always, then discard it if
it would be ineffective.

My understanding is that the bloom filter would be ineffective in any of
these cases
* Hash table is too small
* Bloom filter too large
* Bloom selectivity > 50% - perhaps that can be applied dynamically, so
stop using it if it becomes ineffective

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

WIP: bloom filter in Hash Joins with batches at 2015-12-15 22:30:06 from Tomas Vondra

Responses

Re: WIP: bloom filter in Hash Joins with batches at 2015-12-17 16:00:47 from Tomas Vondra

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Mithun Cy	2015-12-17 11:15:07	Re: POC: Cache data in GetSnapshotData()
Previous Message	Shulgin, Oleksandr	2015-12-17 09:50:46	Re: WIP: bloom filter in Hash Joins with batches