Re: Hash Joins vs. Bloom Filters / take 2

From: Claudio Freire <klaussfreire(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hash Joins vs. Bloom Filters / take 2
Date: 2018-02-22 11:44:42
Message-ID: CAGTBQpYb5wAAqC4JTBE5DSw=5_TZpfqzqTZYq0Dphs3_0zUM6Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Feb 21, 2018 at 11:21 PM, Tomas Vondra
<tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> On 02/21/2018 02:10 AM, Peter Geoghegan wrote:
>> On Tue, Feb 20, 2018 at 3:54 PM, Tomas Vondra
>> <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>>>> I suspect that it could make sense to use a Bloom filter to
>>>> summarize the entire inner side of the join all at once, even
>>>> when there are multiple batches. I also suspect that this is
>>>> particularly beneficial with parallel hash joins, where
>>>> IPC/synchronization overhead can be a big problem.
>>>>
>>>
>>> But that's what the patch does, currently - the filter is built
>>> during the initial pass through the data, and then used for all
>>> batches.
>>
>> I misunderstood. I would probably do something like double or triple
>> the original rows estimate instead, though. The estimate must be at
>> least slightly inaccurate when we get to this point, but I don't
>> think that that's a good enough reason to give up on the estimate
>> completely.
>>
>
> That's a problem only for the multi-batch case, though.
>
> With a single batch we can walk the hash table and count non-empty
> buckets, to get a good ndistinct estimate cheaply. And then size the
> filter considering both memory requirements (fits into CPU cache) and
> false positive rate. There are other things we may need to consider
> (memory usage vs. work_mem) but that's a separate issue.
>
> With multiple batches I think we could use the "size the bloom filter
> for a fraction of work_mem" which the current patch uses when switching
> to multiple batches halfway-through. That pretty much entirely ignores
> the estimate and essentially replaces it with a "fictional" estimate.
>
> I think that's a better approach than using some arbitrary multiple of
> the estimate. When we have to start batching halfway through, the
> estimate is proven to be rather bogus anyway, but we may treat it as a
> lower boundary for the bloom filter size.

...

>> As I said, X should not be a portion of work_mem, because that has
>> only a weak relationship to what really matters.
>>
>
> I agree a fixed fraction of work_mem may not be the right thing, but the
> goal was to make the bloom filter part of the Hash memory budget, i.e.
>
> bloom filter + hash table <= work_mem
>
> (which I think we agree should be the case), without increasing the
> number of batches too much. For example, if you size the filter ignoring
> this, and it end up being 90% of work_mem, you may need to do the hash
> join in 128 batches instead of just 16. Or something like that.
>
> Maybe that would still be a win, though. Firstly, the higher number of
> batches may not have a huge impact - in one case we need to serialie
> 15/16 and in the other one 127/128. That's 93% vs. 99%. And if the more
> accurate filter allows us to discard much more data from the outer
> relation ...

Let me reiterate, you can avoid both issues with scalable bloom filters[1].

An HLL can be used to estimate set size, the paper makes no mention of
it, probably assuming only distinct items are added to the set.

[1] http://gsd.di.uminho.pt/members/cbm/ps/dbloom.pdf

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2018-02-22 11:49:35 Re: [HACKERS] Add support for tuple routing to foreign partitions
Previous Message David Rowley 2018-02-22 11:28:13 Re: [HACKERS] path toward faster partition pruning