Re: Minmax indexes

From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Claudio Freire <klaussfreire(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Minmax indexes
Date: 2014-06-21 18:08:57
Message-ID: 20140621180857.GA24593@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I'm sorry if I missed something, but ISTM this is beginning to look a
lot like GiST. This was pointed out by Robert Haas last year.

On Wed, Jun 18, 2014 at 12:09:42PM -0300, Claudio Freire wrote:
> So, you have:
>
> An aggregate to generate a "compressed set" from several values

Which GiST does by calling 'compress' on each value, and the 'unions' the
results together.

> A function which adds a new value to the "compressed set" and returns
> the new "compressed set"

Again, 'compress' + 'union'

> A function which tests if a value is in a "compressed set"

Which GiST does using 'compress' +'consistant'

> A function which tests if a "compressed set" overlaps another
> "compressed set" of equal type

Which GiST calls 'consistant'

So I'm wondering why you can't just reuse the btree_gist functions we
already have in contrib. It seems to me that these MinMax indexes are
in fact a variation on GiST that indexes the pages of a table based
upon the 'union' of all the elements in a page. By reusing the GiST
operator class you get support for many datatypes for free.

> If you can define different compressed sets, you can use this to
> generate both min/max indexes as well as bloom filter indexes. Whether
> we'd want to have both is perhaps questionable, but having the ability
> to is probably desirable.

You could implement bloom filter in GiST too. It's been discussed
before but I can't find any implementation. Probably because the filter
needs to be parameterised and if you store the bloom filter for each
element it gets expensive very quickly. However, hooked into a minmax
structure which only indexes whole pages it could be quite efficient.

> One problem with such a generalized implementation would be, that I'm
> not sure in-place modification of the "compressed set" on-disk can be
> assumed to be safe on all cases. Surely, for strictly-enlarging sets
> it would, but while min/max and bloom filters both fit the bill, it's
> not clear that one can assume this for all structures.

I think GiST has already solved this problem.

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> He who writes carelessly confesses thereby at the very outset that he does
> not attach much importance to his own thoughts.
-- Arthur Schopenhauer

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Kevin Grittner 2014-06-21 18:23:44 Re: idle_in_transaction_timeout
Previous Message Kevin Grittner 2014-06-21 18:06:26 Re: delta relations in AFTER triggers