Re: WIP: bloom filter in Hash Joins with batches

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: bloom filter in Hash Joins with batches
Date: 2015-12-24 13:51:15
Message-ID: CANP8+jK+rSLrDVFvkNngGa7BT0ML7d=dUkk+DappRKMTp4bRBw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 17 December 2015 at 16:00, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
wrote:

> On 12/17/2015 11:44 AM, Simon Riggs wrote:
>
>>
>> My understanding is that the bloom filter would be ineffective in any of
>> these cases
>> * Hash table is too small
>>
>
> Yes, although it depends what you mean by "too small".
>
> Essentially if we can do with a single batch, then it's cheaper to do a
> single lookup in the hash table instead of multiple lookups in the bloom
> filter. The bloom filter might still win if it fits into L3 cache, but that
> seems rather unlikely.
>
> * Bloom filter too large
>>
>
> Too large with respect to what?
>
> One obvious problem is that the bloom filter is built for all batches at
> once, i.e. for all tuples, so it may be so big won't fit into work_mem (or
> takes a significant part of it). Currently it's not accounted for, but
> that'll need to change.

The benefit seems to be related to cacheing, or at least that memory speed
is critical. If the hash table is too small, or the bloom filter too large
then there would be no benefit from performing the action (Lookup Bloom
then maybe Lookup Hash) compared with just doing (LookupHash).

So the objective must be to get a Bloom Filter that is small enough that it
lives in a higher/faster level of cache than the main Hash table. Or
possibly that we separate that into a two stage process so that the first
level can be applied by a GPU and then later checked against hash outside
of a GPU.

I think you also need to consider whether we use a hash bloom filter or
just simply apply an additional range predicate. The latter idea is similar
to my earlier thoughts here
http://www.postgresql.org/message-id/CA+U5nMLYf2cgbq+YUw-ArLBTcPrqanBf5QiFEC-PBRJCFzOngg@mail.gmail.com

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2015-12-24 13:57:21 Re: WIP: Rework access method interface
Previous Message Michael Paquier 2015-12-24 12:45:40 Re: Commit fest status for 2015-11