Quick Links

Re: Hash Joins vs. Bloom Filters / take 2

From:	Peter Geoghegan <pg(at)bowt(dot)ie>
To:	Claudio Freire <klaussfreire(at)gmail(dot)com>
Cc:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hash Joins vs. Bloom Filters / take 2
Date:	2018-02-20 23:23:58
Message-ID:	CAH2-Wz=FZ8r=qDX3hz5ctaDNts1+bdeVNYhXB+m+dr79XYTajQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Tue, Feb 20, 2018 at 3:17 PM, Claudio Freire <klaussfreire(at)gmail(dot)com> wrote:
> I've worked a lot with bloom filters, and for large false positive
> rates and large sets (multi-million entries), you get bloom filter
> sizes of about 10 bits per distinct item.

It's generally true that you need 9.6 bits per element to get a 1%
false positive rate. 1% could be considered much too low here.

Do we need to eliminate 99% of all hash join probes (that find nothing
to join on) to make this Bloom filter optimization worthwhile?
Personally, I doubt it.

> That's efficient, but it's not magic. It can still happen that the
> whole set can't fit in work_mem with an acceptable false positive
> rate.

A merge join is always going to be the better choice when extremely
memory constrained.

--
Peter Geoghegan

In response to

Re: Hash Joins vs. Bloom Filters / take 2 at 2018-02-20 23:17:28 from Claudio Freire

Responses

Re: Hash Joins vs. Bloom Filters / take 2 at 2018-02-20 23:48:11 from Claudio Freire

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	R, Siva	2018-02-20 23:30:35	Duplicate Item Pointers in Gin index
Previous Message	Claudio Freire	2018-02-20 23:17:28	Re: Hash Joins vs. Bloom Filters / take 2