From: | Peter Geoghegan <pg(at)bowt(dot)ie> |
---|---|
To: | Claudio Freire <klaussfreire(at)gmail(dot)com> |
Cc: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Hash Joins vs. Bloom Filters / take 2 |
Date: | 2018-02-20 23:23:58 |
Message-ID: | CAH2-Wz=FZ8r=qDX3hz5ctaDNts1+bdeVNYhXB+m+dr79XYTajQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Feb 20, 2018 at 3:17 PM, Claudio Freire <klaussfreire(at)gmail(dot)com> wrote:
> I've worked a lot with bloom filters, and for large false positive
> rates and large sets (multi-million entries), you get bloom filter
> sizes of about 10 bits per distinct item.
It's generally true that you need 9.6 bits per element to get a 1%
false positive rate. 1% could be considered much too low here.
Do we need to eliminate 99% of all hash join probes (that find nothing
to join on) to make this Bloom filter optimization worthwhile?
Personally, I doubt it.
> That's efficient, but it's not magic. It can still happen that the
> whole set can't fit in work_mem with an acceptable false positive
> rate.
A merge join is always going to be the better choice when extremely
memory constrained.
--
Peter Geoghegan
From | Date | Subject | |
---|---|---|---|
Next Message | R, Siva | 2018-02-20 23:30:35 | Duplicate Item Pointers in Gin index |
Previous Message | Claudio Freire | 2018-02-20 23:17:28 | Re: Hash Joins vs. Bloom Filters / take 2 |