Re: Minmax indexes

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Claudio Freire <klaussfreire(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Minmax indexes
Date: 2014-07-09 21:16:19
Message-ID: 20140709211619.GQ6390@eldon.alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Claudio Freire wrote:

> An aggregate to generate a "compressed set" from several values
> A function which adds a new value to the "compressed set" and returns
> the new "compressed set"
> A function which tests if a value is in a "compressed set"
> A function which tests if a "compressed set" overlaps another
> "compressed set" of equal type
>
> If you can define different compressed sets, you can use this to
> generate both min/max indexes as well as bloom filter indexes. Whether
> we'd want to have both is perhaps questionable, but having the ability
> to is probably desirable.

Here's a new version of this patch, which is more generic the original
versions, and similar to what you describe.

The way it works now, each opclass needs to have three support
procedures; I've called them getOpers, maybeUpdateValues, and compare.
(I realize these names are pretty bad, and will be changing them.)
getOpers is used to obtain information about what is stored for that
data type; it says how many datum values are stored for a column of that
type (two for sortable: min and max), and how many operators it needs
setup. Then, the generic code fills in a MinmaxDesc(riptor) and creates
an initial DeformedMMTuple (which is a rather ugly name for a minmax
tuple held in memory). The maybeUpdateValues amproc can then be called
when there's a new heap tuple, which updates the DeformedMMTuple to
account for the new tuple (in essence, it's a union of the original
values and the new tuple). This can be done repeatedly (when a new
index is being created) or only once (when a new heap tuple is inserted
into an existing index). There is no need for an "aggregate".

This DeformedMMTuple can easily be turned into the on-disk
representation; there is no hardcoded assumption on the number of index
values stored per heap column, so it is possible to build an opclass
that stores a bounding box column for a geometry heap column, for
instance.

Then we have the "compare" amproc. This is used during index scans;
after extracting an index tuple, it is turned into DeformedMMTuple, and
the "compare" amproc for each column is called with the values of scan
keys. (Now that I think about this, it seems pretty much what
"consistent" is for GiST opclasses). A true return value indicates that
the scan key matches the page range boundaries and thus all pages in the
range are added to the output TID bitmap.

Of course, you can have multicolumn indexes, and (as would be expected)
each column can have totally different opclasses; so for instance you
could have an integer column and a geometric column in the same index,
and it should work fine. In a query that constrained both columns, only
those page ranges that satisfied the scan keys for both columns would be
returned.

I think this level of abstraction is good --- AFAICS it is easy to
implement opclasses for datatypes not suitable for btree opclasses such
as geometric ones, etc. This answers the concerns of those who wanted
to see this support datatypes that don't have the concept of min/max at
all. I'm not sure about bloom filters, as I've forgotten how those
work. Of course, the implementation of min/max is there: that logic has
been abstracted out into what I've called "sortable opfamilies" for now
(name suggestions welcome) --- it can be used for any datatype with a
btree opclass.

I think design-wise it ended up making a lot of sense, because all the
opclass-specific assumptions about usage of "min" and "max" values and
comparisons using the less-than etc operators are contained in the
mmsortable.c file, and the basic minmax.c file only needs to know to
call the right opclass-specific procedures. The basic code might need
some tweaks to ensure that we're not assuming anything about the
datatypes of the stored values (as opposed to the datatypes of the
indexed columns), but this is a matter of tweaking the MinmaxDesc and
the getOpers amprocs; it wouldn't require changing the on-disk
representation, and thus is upgrade-compatible.

There's a bit of boilerplate code in the amproc routines which would be
nice to be able to remove (mainly involving null values), but I think to
do that we would need to split those three amprocs into maybe four or
five, which is not as nice, so I'm not real sure about doing it.

All this being said, I'm sticking to the name "Minmax indexes". There
was a poll in pgsql-advocacy
http://www.postgresql.org/message-id/53A0B4F8.8080803@agliodbs.com
about a new name, but there were no suggestions supported by more than
one person. If a brilliant new name comes up, I'm open to changing it.

Another thing I noticed is that version 8 of the patch blindly believed
the "pages_per_range" declared in catalogs. This meant that if somebody
did "alter index foo set pages_per_range=123" the index would
immediately break (i.e. return corrupted results when queried). I have
fixed this by storing the pages_per_range value used to construct the
index in the metapage. Now if you do the ALTER INDEX thing, the new
value is only used when the index is recreated by REINDEX.

There are still things to go over before this is committable,
particularly regarding vacuuming the index, but as far as index creation
and scanning it should be good to test. (Vacuuming should work just
fine most of the time also, but there are a few wrinkles pointed out by
Heikki.)

One thing I've disabled for now is the pageinspect code that displays
index items. I need to rewrite that.

Closing thought: thinking more about it, the bit about returning
function OIDs by getOpers when creating a MinmaxDesc seems unnecessary.
I think we could go by with just returning the number of values stored
in the column, and have the operators be part of an opaque struct that's
initialized and only touched by the opclass amprocs, not by the generic
code.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
minmax-9.patch text/x-diff 164.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2014-07-09 21:19:46 Re: Doing better at HINTing an appropriate column within errorMissingColumn()
Previous Message Peter Geoghegan 2014-07-09 19:59:30 Re: Doing better at HINTing an appropriate column within errorMissingColumn()