Re: Bloom Filter lookup for hash joins

From: Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
To: Ants Aasma <ants(at)cybertec(dot)at>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Bloom Filter lookup for hash joins
Date: 2013-06-26 12:01:42
Message-ID: CAOeZVieQTEJpU55T7zCMdUEzGjx9dzsP_6rzmuCuQYuDsJ7LBw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> One idea that I had was to use them to do CLOG lookups from smaller
> datastructures. You could store the list of aborted xids in bloom
> filters. When a xid is not found in the filter, then it is known to
> have committed, if the filter returns true, then you have to check the
> real CLOG. This would achieve for example a 1:8 compression ratio at
> 99% commit rate and 1:160'000 false positive rate, or 1:16 compression
> ratio at 1:400 false positive rate. Nothing too exciting, so I didn't
> really develop the idea any further.
>
Right.

>
> The problem here is that if the hash table is in memory, doing a hash
> table lookup directly is likely to be cheaper than a bloom filter
> lookup, even if the bloom filter fits into the processor cache and the
> hash table doesn't (10 last level cache hits is slower than one cache
> miss). Bloom filter will help when its not feasible to use an actual
> hash table (lots of large keys), the real lookup is slow (e.g. an
> index lookup), you are doing a lot of lookups to amortize the
> construction cost and the condition is expected to be selective (most
> lookups give a negative). There might be some dataware house types of
> queries where it would help, but it seems like an awfully narrow use
> case with a potential for performance regressions when the planner has
> a bad estimate.

Ok, sounds good. Cant we use bloom filters for the case where the hash
table doesnt fit in memory? Specifically, when reading from disk is
inevitable for accessing the hash table, we can use bloom filters for
deciding which keys to actually read from disk.Thus ,we save I/O costs
which we would have incurred when reading the keys which are not the
ones we want and bloom filter gives us a negative.

Regards,

Atri

--
Regards,

Atri
l'apprenant

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2013-06-26 12:09:07 Re: Move unused buffers to freelist
Previous Message Amit Kapila 2013-06-26 11:57:33 Re: Review: Patch to compute Max LSN of Data Pages