Re: WIP: BRIN bloom indexes

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Nico Williams <nico(at)cryptonector(dot)com>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: WIP: BRIN bloom indexes
Date: 2017-10-28 01:16:51
Message-ID: 7db089c0-8424-c2be-cc89-9e4e23ed868c@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

hi,

On 10/28/2017 02:41 AM, Nico Williams wrote:
> On Fri, Oct 27, 2017 at 10:06:58PM +0200, Tomas Vondra wrote:
>>> + * We use an optimisation that initially we store the uint32 values directly,
>>> + * without the extra hashing step. And only later filling the bitmap space,
>>> + * we switch to the regular bloom filter mode.
>>>
>>> I don't think that optimization is worthwhile. If I'm going to be using
>>> a Bloom filter, it's because I expect to have a lot of rows.
>>>
>>> (Granted, if I CREATE TABLE .. LIKE .. INCLUDING INDEXES, and the new
>>> table just won't have lots of rows, then I might want this optimization,
>>> but I can always just drop the Bloom filter index, or not include
>>> indexes in the LIKE.)
>>
>> I think you're confusing "rows" and "distinct values". Furthermore, it's
>> rather irrelevant how many rows are in the table because BRIN indexes
>> split the table into "ranges" that are 1MB by default. And you can only
>> squash certain number of rows into such range.
>>
>> The idea of the optimization is to efficiently support cases where each
>> range contains only small number of distinct values - which is quite
>> common in the cases I described in my initial message (with bursts of
>> rows with the same IP / client identifier etc.).
>
> What range? It's just bits to set.
>
> The point is that Bloom filters should ideally be about 50% full -- much
> less than that and you are wasting space, much more than than and your
> false positive rate becomes useless.

The range defined by BRIN indexes. BRIN indexes split the table into a
sequence of page ranges (128 pages by default, i.e. 1MB), and the bloom
filters are built on those table "chunks". So it's irrelevant how many
rows are in the table in total, what matters is just the page range.

>
>>> Filter compression is not worthwhile. You want to have a fairly uniform
>>> hash distribution, and you want to end up with a Bloom filter that's not
>>> much more than 50% full. That means that when near 50% full the filter
>>> will not be very sparse at all. Optimizing for the not common case
>>> doesn't seem worthwhile, and adds complexity.
>>
>> Properly sizing the bloom filter requires knowledge of many variables,
>> in particular the number of distinct values expected to be added to the
>> filter. But we don't really know that number, and we also don't even
>> know many values useful for estimating that (how many rows will fit into
>> a range, number of distinct values in the whole table, etc.)
>
> This is why Scalable Bloom filters exist: so you can adapt.
>
>> So the idea was to oversize the bloom filter, and then use the sparse
>> representation to reduce the size.
>
> A space-efficient representation of sparse bitmaps is not as simple as a
> Scalable Bloom filter.
>
> And a filter that requires user intervention to size correctly, or which
> requires rebuilding when it gets too full, is also not as convenient as
> a Scalable Bloom filter.
>

Maybe. For now I'm fine with the simple relopts-based approach and don't
plan to spend much time on these improvements. Which is not to say that
they may not be worthwhile, but it's not the only thing I'm working on.

I also suspect there are practical implications, as some things are only
possible with equally-sized bloom filters. I believe the "union"
(missing in the current patch) will rely on that when merging bloom filters.

>>> + * XXX We can also watch the number of bits set in the bloom filter, and
>>> + * then stop using it (and not store the bitmap, to save space) when the
>>> + * false positive rate gets too high.
>>>
>>> Ah, right, what you want is a "Scalable Bloom Filter".
>>
>> That's not what I had in mind. My idea was to size the bloom filter on
>> "best effort" basis, and then if one range gets a bit too inaccurate
>> then just get rid of the filter. If that happens for many ranges, we
>> should probably allow specifying parameters as relopts for the index.
>
> Scalable Bloom filters are way more convenient than that. They're
> always not-too-full, and only the open filter is less-than-full-enough.
>
> And since you should grow them somewhat exponentially (but not quite as
> much as a doubling in each generation), there should never be too many
> filters to search. But you can always "vacuum" (rebuild) the filter
> starting with the size of the last filter added prior to the vacuum.
>
>> I think this is really an over-engineering, and I certainly don't plan
>> to extend the patch in this direction.
>
> Whereas I think a sparse bitmap representation is overly complex and
> "over-engineering". Scalable Bloom filters are very well understood in
> the literature -- there's nothing terribly complex to them.
>

That "sparse bitmap" is mentioned merely in an XXX comment as a thing to
consider, not something I plan to work right now. Perhaps it's not
really a good idea in general, or maybe it's addressing a non-issue. In
which case we don't need scalable bloom filters either.

>> I do not expect these parameters (particularly the number of distinct
>> values in a range) to significantly change over time, so the easiest
>> solution is to provide a reloption to specify that number in
>> CREATE/ALTER index.
>
> Doesn't this depend on the use-case though? I think a self-tuning
> system is better than one that doesn't self-tune. Call that
> over-engineering if you like, but it doesn't make it not desirable :)
>

I'm simply not convinced we need that additional complexity, which has
it's cost too. Also, if "self-tuning means" means creating multiple
bloom filters with different sizes, then it may have impact on some
operations (like "union").

>> Alternatively, we could allow the summarization process to gather some
>> statistics (either on the whole range, or perhaps the whole table), and
>> store them somewhere for later use. For example to compute the number of
>> distinct values per range (in the existing data), and use that later.
>
> Again, Scalable Bloom filters automatically adapt without needing a
> statistics gathering exercise. All you need is a good hash function
> (that's another topic).
>
> Scalable Bloom filters are a trivial extension of Bloom filters.
>

Sure. But it's additional complexity and I'm not convinced we need it
(considering how BRIN splits the table into ranges of limited size).
Hence I'll adopt the simple approach in my patch, and if it turns out to
be necessary, it can be improved in this direction.

>>> What I'm getting at is that the query planner really needs to understand
>>> that a Bloom filter is a probabilistic data structure.
>>
>> It does, and we never produce incorrect results. Which seems like the
>> right thing to do.
>
> OK, I saw your other response about this. I didn't know that PG already
> has support for probabilistic indexes.
>

It's not as much "probabilistic" as understanding that some indexes may
produce false positives, in which case a recheck is needed.

> Nico
>

Thanks for the comments and feedback!

cheers

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-10-28 03:20:00 Re: Transactions involving multiple postgres foreign servers
Previous Message Nico Williams 2017-10-28 00:41:08 Re: WIP: BRIN bloom indexes