Re: Boom filters for hash joins (was: A design for amcheck heapam verification)

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Boom filters for hash joins (was: A design for amcheck heapam verification)
Date: 2017-09-19 13:28:00
Message-ID: 9b9fd273-18e7-2b07-7aa1-4b00ab59b8d1@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 09/19/2017 02:55 AM, Robert Haas wrote:
> On Mon, Sep 18, 2017 at 5:13 PM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>> On Mon, Sep 18, 2017 at 2:07 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>> On Mon, Sep 18, 2017 at 1:29 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>>> Uh, why does the planner need to be involved at all?
>>>
>>> Because it loses if the Bloom filter fails to filter anything. That's
>>> not at all far-fetched; consider SELECT * FROM a.x, b.x WHERE a.x =
>>> b.x given a foreign key on a.x referencing b(x).
>>
>> Wouldn't a merge join be a lot more likely in this case anyway? Low
>> selectivity hash joins with multiple batches are inherently slow; the
>> wasted overhead of using a bloom filter may not matter.
>>
>> Obviously this is all pretty speculative. I suspect that this could be
>> true, and it seems worth investigating that framing of the problem
>> first.
>
> ISTR Tomas Vondra doing some experiments with this a few years ago and
> finding that it was, in fact, a problem.
>

You seem to have better memory than me, but you're right - I did some
experiments with this in 2015, the WIP patch and discussion is here:

https://www.postgresql.org/message-id/5670946E.8070705@2ndquadrant.com

The whole idea was that with a bloom filter we can reduce the amount of
tuples (from the outer relation) written to batches.

The patch is fairly simple, and did not try to push the bloom filters to
scan nodes or anything like that. It might be a meaningful first step,
though, particularly for selective joins (where only small number of
rows from the outer relation has a match in the hash table).

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robins Tharakan 2017-09-19 14:07:00 Re: psql - add ability to test whether a variable exists
Previous Message Amit Kapila 2017-09-19 13:26:29 Re: Setting pd_lower in GIN metapage