Re: WIP: bloom filter in Hash Joins with batches

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP: bloom filter in Hash Joins with batches
Date: 2015-12-31 20:38:35
Message-ID: 5685924B.6070403@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

attached is v2 of the patch, with a number of improvements:

0) This relies on the the other hashjoin patches (delayed build of
buckets and batching), as it allows sizing the bloom filter.

1) enable_hashjoin_bloom GUC

This is mostly meant for debugging and testing, not for committing.

2) Outer joins should be working fine now. That is, the results should
be correct and faster as the outer rows without matches should not
be batched at all.

3) The bloom filter is now built for all hash joins, not just when
batching is happening. I've been a bit skeptical about the
non-batched cases, but it seems that I can get a sizable speedup
(~20-30%, depending on the selectivity of the join).

4) The code is refactored quite a bit, adding BloomFilterData instead
of just sprinkling the fields on HashJoinState or HashJoinTableData.

5) To size the bloom filter, we now use HyperLogLog couter, which we
now have in core thanks to the sorting improvements done by Peter
Geoghegan. This allows making the bloom filter much smaller when
possible.

The patch also extends the HyperLogLog API a bit (which I'll submit
to the CF independently).

There's a bunch of comments in the code, mostly with ideas about more
possible improvements.

The main piece missing in the patch (IMHO) is optimizer code making
decisions whether to enable bloom filters for the hash join, based on
cardinality estimates. And also the executor code disabling the bloom
filter if they turn inefficient. I don't think that's a major issue at
this point, though, and I think it'll be easier to do based on testing
the current patch.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
0001-extend-the-HyperLogLog-API-a-bit-by-adding-two-more-.patch text/x-diff 2.0 KB
0002-hash-bloom-filters-v2.patch text/x-diff 25.9 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2015-12-31 20:48:45 PATCH: Extending the HyperLogLog API a bit
Previous Message Tom Lane 2015-12-31 20:36:27 Re: IMPORT FOREIGN SCHEMA return create foreign table commands are those further filtered in LIMIT and EXCEPT cases?