Re: Hash Joins vs. Bloom Filters / take 2

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hash Joins vs. Bloom Filters / take 2
Date: 2018-02-21 07:17:54
Message-ID: CAEepm=0N6DODN7nx6Zb93YOW-y=RftNNFZJRaLyG6jbJHJVjsA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Feb 21, 2018 at 10:23 AM, Tomas Vondra
<tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> In 2015/2016 I've been exploring if we could improve hash joins by
> leveraging bloom filters [1], and I was reminded about this idea in a
> thread about amcheck [2]. I also see that bloom filters were briefly
> mentioned in the thread about parallel hash [3].
>
> So I've decided to revive the old patch, rebase it to current master,
> and see if we can resolve the issues that killed it in 2016.

Nice!

> Opinions?

I'm definitely following this and interested in helping in some way if
I can. I have wondered about this subject and discussed it a bit with
Peter Geoghegan off-list.

Some assorted thoughts:

In the old thread, Peter pointed at a curious undergrad student
project from 2008[1] evaluating Bloom filters for hash joins in
PostgreSQL 8.3, inspired by a couple of older papers[2][3]. While
your patch uses a Bloom filter to short-circuit the regular bucket
probe in ExecHashJoinImpl(), these approach push the Bloom filter down
into the outer relation scan. I suspect you're right about the fixed
sizing being a problem, but the general idea seems pretty interesting
to me and there seems to be no reason you couldn't make the filter
size dynamic as you have it and then share it via a parameter or
something. But is there any point?

On the one hand, pushing down Bloom filters requires the hash value to
be computed by the lower scan, and then computed again if the tuple
survives the filter and makes it into the Hash Join node (unless there
is some way to attach it to the tuple...). On the other hand,
throwing away tuples sooner can avoid more work, particularly in the
case of multi-joins.

Based on a hint from Jim Finnerty at PGCon 2017 and also some light
reading of ancient stuff on join pipelining, I've been wondering how
the potential effectiveness of Bloom filters is affected by the fact
that we prefer left-deep nested hash joins and for us build relation =
right/inner relation. That policy means that we build hash tables for
all the joins up front, which means we could potentially push all
their Bloom filters down to the ultimate outer scan (or as far down as
possible depending on the qual) as I now find described in [2]. So:

H1
/\
H2 u
/\
H3 t
/\
r s

You might be able to push filters B1, B2, B3 constructed from H1, H2,
H3 all the way down to the scan of r and then probe them in order of
selectivity (a bit like order_qual_clauses does with any other kind of
qual). (Because of the empty-outer optimisation's preliminary attempt
to pull a tuple on the left I guess you might need to push down a
placeholder filter first and then later replace it with the real
filter if/when you eventually build it.)

In contrast to that situation, suppose we introduced whole-query
memory budgets as Peter Geoghegan and several others have proposed.
Then we'd suddenly have a potential reason to prefer right-deep plans:

H1
/\
r H2
/\
s H3
/\
t u

Many other RDBMSs would consider this (modulo some confusion about
hash join polarity: other systems and literature have build = inner =
left, probe = outer = right so they'd draw that the other way around
and call it left deep, somewhat distractingly...) because it only
requires two hash tables to be loaded into memory at a time, a useful
property if you want to stay inside a whole-query memory budget.
That's because the output of each hash join is the input to the hash
table above it, so in theory we only need the one we're building and
the one we're probing at each moment. Whatever other pros and cons
might exist with this plan shape compared to the left-deep variant, my
point is that a right-deep join could only push each filter down to
the immediate scan of r, s, t, which wouldn't be much of a potential
improvement over the way your current patch does it (ie entirely
inside the hash join, no push down), because it's not able to move the
Bloom filter very far away from the hash join anyway. At best it
could perhaps move it before some more expensive/less selective other
quals in the scan.

In other words, I wonder if our left-deep plans stand to gain more
from Bloom filter push down.

[1] http://www.nus.edu.sg/nurop/2010/Proceedings/SoC/NUROP_Congress_Cheng%20Bin.pdf
[2] http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.73.8069&rep=rep1&type=pdf
[3] https://pdfs.semanticscholar.org/ec99/6093805012b9e0ff06bb2050436091671091.pdf

--
Thomas Munro
http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tsunakawa, Takayuki 2018-02-21 07:20:00 RE: Speed up the removal of WAL files
Previous Message Andres Freund 2018-02-21 05:34:26 Re: pgsql: Avoid valgrind complaint about write() of uninitalized bytes.