Bloom Filter lookup for hash joins

From: Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
To: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Bloom Filter lookup for hash joins
Date: 2013-06-26 06:46:03
Message-ID: CAOeZVif-R-iLF966wEipk5By-KhzVLOqpWqurpaK3p5fYw-Rdw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi All,

I have been researching bloom filters and discussed it on IRC with
RhodiumToad and David Fetter, and they pointed me to the various
places that could potentially have bloom filters, apart from the
places that already have them currently.

I have been reading the current implementation of hash joins, and in
ExecScanHashBucket, which I understand is the actual lookup function,
we could potentially look at a bloom filter per bucket. Instead of
actually looking up each hash value for the outer relation, we could
just check the corresponding bloom filter for that bucket, and if we
get a positive, then lookup the actual values i.e. continue with our
current behaviour (since we could be looking at a false positive).

This doesn't involve a lot of new logic. We need to add a bloom filter
in HashJoinState and set it when the hash table is being made. Also,
one thing we need to look at is the set of hash functions being used
for the bloom filters. This can be a point of further discussion.

The major potential benefit we could gain is that bloom filters never
give false negatives. So, we could save a lot of lookup where the
corresponding bloom filter gives a negative.

This approach can also be particularly useful for hash anti joins,
where we look for negatives. Since bloom filters can easily be stored
and processed, by straight bit operations, we could be looking at a
lot of saving here.

I am not sure if we already do something like this. Please direct me
to the relevant parts in the code if we already do.

Regards,

Atri

--
Regards,

Atri
l'apprenant

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robins Tharakan 2013-06-26 06:55:53 Re: Add more regression tests for dbcommands
Previous Message Hari Babu 2013-06-26 05:36:00 Re: fixing pg_ctl with relative paths