extended operator classes vs. type interfaces

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: extended operator classes vs. type interfaces
Date: 2010-04-09 02:29:18
Message-ID: r2t603c8f071004081929v81198fferbff67720c815af6f@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

There are a couple of features that were considered but not committed
for 9.0 which require additional information about the properties of
various data types. At PGeast I had a chance to talk with Jeff Davis
about this briefly, and wanted to write up some of what we talked
about plus some further thoughts of my own before they went out of my
head. The features I'm thinking about are:

1. knngist wants to use index scans to speed up queries of the form
SELECT ... ORDER BY <column> <op> <constant> (as opposed to the
existing machinery which only knows how to use an index for SELECT ...
ORDER BY <column>).
2. Window functions want to define windows over a range of values
defined by the underlying data type. To do this, we need to define
what addition and subtraction mean for a particular data type.
3. Jeff Davis is interested in implementing range types. When the
underlying base type is discrete, e.g. integers, you can say that
[1,3] = [1,4), but only if you know that 3 and 4 are consecutive (in
that order).

All of these problems seem loosely related: we're teaching the
database more about how certain types behave. But there are some
important differences. In case #1, we're trying to teach the planner
that if it sees a certain operator, it can map that operator onto an
AM strategy - so the knowledge is fundamentally AM-specific. In the
remaining cases, the information needed is a property of the
underlying data type with no natural or logical relationship to an
access method. For that reason, I don't think there's going to be any
clean way to create a single mechanism that will encompass all of
these needs, though maybe someone has a clever idea I'm not thinking
of.

What we discussed doing before for #1 (and it make sense to me) is to
add a column pg_amop.amopcategory. The existing operator class
functions will constitute one category - map a boolean operator onto
an index strategy number that can be used to treat a filter condition
as an index qual. The functions knngist cares about will be a second
category - map an operator that returns some arbitrary type that is
legal in the context of an ORDER BY clause onto an index strategy
number that can regurgitate the tuples in the order defined by the
return type.

While it might be possible to shoehorn the remaining cases into the
operator class machinery, it seems likely that it will be nothing but
ugly. The whole charter of the operator class machinery at least AIUI
is to map operators onto AM-specific index strategy numbers, and there
is neither an applicable AM nor a strategy number for it in any of
these cases. So I think it's time to create a separate concept of
type interfaces (Jeff Davis proposed this name, and I like it). What
might this look like?

Given a type T, I think we'd like to be able to define a type U as
"the natural type to be added to or subtracted from T". As Jeff
pointed out to me, this is not necessarily the same as the underlying
type. For example, if T is a timestamp, U is an interval; if T is a
numeric, U is also a numeric; if T is a cidr, U is an integer. Then
we'd like to define a canonical addition operator and a canonical
subtraction operator. I think that would be sufficient for the needs
of RANGE BETWEEN ... PRECEDING AND ... FOLLOWING. It would also be
nearly sufficient for range types, but in that case you also need to
specify the unit increment within U - i.e. a "1" value for the
datatype. It may or may not be worth building the concept of a unit
increment into the type interface machinery, though: one could imagine
two different range types built over the same base type with different
unit increments - e.g. one timestamp range with unit increment = 1s,
and one with unit increment = 1m. Under the first type [4pm,5pm) =
[4pm,4:59:59pm], while under the second [4pm,5pm) = [4pm,4:59pm].

In a previous thread on this topic, in response to a question about
other possible needs for operator knowledge, Hitoshi Harada mentioned
range partitioning as another possible case. If we have a partition
where x ranges from 1 to 100, that's basically saying that x >= 1 and
x <= 100, for suitable values of >= and <=. I think we're already
habituated to doing this kind of thing by looking at the default btree
opclass for the data type and looking for the operator that
implements, e.g. BTLessEqualsStrategyNumber. It might be cleaner to
get all this information from type interfaces, but I'm not sure
whether it's reasonable (either for reasons of complexity or of
performance) to think about untangling all the places where we've
already made this assumption and redoing them.

Thoughts?

...Robert

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Takahiro Itagaki 2010-04-09 02:46:51 Re: GSOC PostgreSQL partitioning issue
Previous Message Joachim Wieland 2010-04-08 23:17:45 a faster compression algorithm for pg_dump