Re: Minmax indexes

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, David Fetter <david(at)fetter(dot)org>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Minmax indexes
Date: 2013-10-01 10:18:24
Message-ID: CA+TgmoYSCbW-UC8LQV96sziGnXSuzAyQbfdQmK-FGu22HdKkaw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Sep 30, 2013 at 1:20 PM, Heikki Linnakangas
<hlinnakangas(at)vmware(dot)com> wrote:
> You can almost create a bounding box opclass in the current implementation,
> by mapping < operator to "contains" and > to "not contains". But there's no
> support for creating a new, larger, bounding box on insert. It will just
> replace the max with the new value if it's "greater than", when it should
> create a whole new value to store in the index that covers both the old and
> the new values. (or less than? I'm not sure which way those operators would
> work..)

This sounds an awful lot like GiST's "union" operation. Actually,
following the GiST model of having "union" and "consistent" operations
might be a smart way to go. Then the exact index semantics could be
decided by the opclass. This might not even be that much extra code;
the existing consistent and union functions for GiST are pretty short.
That way, it'd be easy to add new opclasses with somewhat different
behavior; the common thread would be that every opclass of this new AM
works by summarizing a physical page range into a single indexed
value. You might call the AM something like "summary" or "sparse" and
then have "minmax_ops" for your first opclass.

> In fact, even with regular b-tree operators, over integers for example, you
> don't necessarily want to store both min and max. If you only ever perform
> queries like "WHERE col > ?", there's no need to track the min value. So to
> make this really general, you should be able to create an index on only the
> minimum or maximum. Or if you want both, you can store them as separate
> index columns. Something like:
>
> CREATE INDEX minindex ON table (col ASC); -- For min
> CREATE INDEX minindex ON table (col DESC); -- For max
> CREATE INDEX minindex ON table (col ASC, col DESC); -- For both

This doesn't seem very general, since you're relying on the fact that
ASC and DESC map to < and >. It's not clear what you'd write here if
you wanted to optimize #$ and @!. But something based on opclasses
will work, since each opclass can support an arbitrary set of
operators.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2013-10-01 10:20:20 Re: Cmpact commits and changeset extraction
Previous Message Amit Kapila 2013-10-01 10:00:44 Re: Documentation for SET var_name FROM CURRENT