Re: Minmax indexes

From: Claudio Freire <klaussfreire(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Minmax indexes
Date: 2014-06-19 17:20:33
Message-ID: CAGTBQpZo6KNdt4CQxw42Hbh72Turc87L9+Sy4VRJsXpE+O0oVg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jun 19, 2014 at 10:06 AM, Heikki Linnakangas
<hlinnakangas(at)vmware(dot)com> wrote:
> On 06/18/2014 06:09 PM, Claudio Freire wrote:
>>
>> On Tue, Jun 17, 2014 at 8:48 PM, Greg Stark <stark(at)mit(dot)edu> wrote:
>>>
>>> An aggregate to generate a min/max "bounding box" from several values
>>> A function which takes an "bounding box" and a new value and returns
>>> the new "bounding box"
>>> A function which tests if a value is in a "bounding box"
>>> A function which tests if a "bounding box" overlaps a "bounding box"
>>
>>
>> Which I'd generalize a bit further by renaming "bounding box" with
>> "compressed set", and allow it to be parameterized.
>
>
> What do you mean by parameterized?

Bloom filters can be paired with number of hashes, number of bit
positions, and hash function, so it's not a simple bloom filter index,
but a bloom filter index with N SHA-1-based hashes spread on a
K-length bitmap.

>> So, you have:
>>
>> An aggregate to generate a "compressed set" from several values
>> A function which adds a new value to the "compressed set" and returns
>> the new "compressed set"
>> A function which tests if a value is in a "compressed set"
>> A function which tests if a "compressed set" overlaps another
>> "compressed set" of equal type
>
>
> Yeah, something like that. I'm not sure I like the "compressed set" term any
> more than bounding box, though. GiST seems to have avoided naming the thing,
> and just talks about "index entries". But if we can come up with a good
> name, that would be more clear.

I don't want to use the term bloom filter since it's very specific of
a specific technique, but it's basically that - an approximate set
without false negatives. Ie: compressed set.

It's not a bounding box either when using bloom filters. So...

>> One problem with such a generalized implementation would be, that I'm
>> not sure in-place modification of the "compressed set" on-disk can be
>> assumed to be safe on all cases. Surely, for strictly-enlarging sets
>> it would, but while min/max and bloom filters both fit the bill, it's
>> not clear that one can assume this for all structures.
>
>
> I don't understand what you mean. It's a fundamental property of minmax
> indexes that you can always replace the "min" or "max" or "compressing set"
> or "bounding box" or whatever with another datum that represents all the
> keys that the old one did, plus some.

Yes, and bloom filters happen to fall on that category too.

Never mind what I said. I was thinking of other potential and
imaginary implementation that supports removal or updates, that might
need care with transaction lifetimes, but that's easily fixed by
letting vacuum or some lazy process do the deleting just as it happens
with other indexes anyway.

So, I guess the interface must include also the invariant that
compressed sets only grow, never shrink unless from a rebuild or a
vacuum operation.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2014-06-19 17:31:28 Re: -DDISABLE_ENABLE_ASSERT
Previous Message Tom Lane 2014-06-19 17:18:48 Re: change alter user to be a true alias for alter role