Re: Minmax indexes

From: Greg Stark <stark(at)mit(dot)edu>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Minmax indexes
Date: 2014-06-17 23:48:07
Message-ID: CAM-w4HPPs86--tRGyX8gDxXQX0rd208+ztFmA5GWq=3G-2QjBw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jun 17, 2014 at 11:16 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> Well, it might be doable to correlate them along one axis, but along
> both? That's more complicated... And even alongside one axis you
> already get into problems if your geometries are irregularly sized.
> Asingle large polygon will completely destroy indexability for anything
> stored physically close by because suddently the minmax range will be
> huge... So you'll need to cleverly sort for that as well.

I think there's a misunderstanding here, possibly mine. My
understanding is that a min/max index will always be exactly the same
size for a given size table. It stores the minimum and maximum value
of the key for each page. Then you can do a bitmap scan by comparing
the search key with each page's minimum and maximum to see if that
page needs to be included in the scan. The failure mode is not that
the index is large but that a page that has an outlier will be
included in every scan as a false positive incurring an extra iop.

I don't think it's implausible at all that Geometric data would work
well. If you load Geometric data it's very common to load data by
geographic area so that all objects in San Francisco in one part of
the data load, probably even by zip code or census block.

What operations would an opclass for min/max need? I think it would be
pretty similar to the operators that GiST needs (thankfully minus the
baroque page split function):

An aggregate to generate a min/max "bounding box" from several values
A function which takes an "bounding box" and a new value and returns
the new "bounding box"
A function which tests if a value is in a "bounding box"
A function which tests if a "bounding box" overlaps a "bounding box"

The nice thing is this would let us add things like range @> (contains
element) to the plain integer min/max case.

--
greg

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2014-06-18 00:11:38 Re: btreecheck extension
Previous Message Andres Freund 2014-06-17 23:25:58 Re: comparison operators