Ranges for well-ordered types

From: Michael Glaesemann <grzm(at)seespotcode(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Ranges for well-ordered types
Date: 2006-06-10 14:51:58
Message-ID: 10AC5536-280E-463F-A335-71FEC1FBB774@seespotcode.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I've been interested in representing and manipulating time ranges in
PostgreSQL, where a time range is defined by a start time and an end
time. Time ranges are useful, for example, in representing when some
predicate was known valid. Similarly, time ranges can be used to
represent "transaction time": the version history of the data itself.
This is similar to the "time travel" feature in previous versions of
PostgreSQL. (Tables that include both valid time and transaction time
information are sometimes called bitemporal tables.) Ranges can also
be useful in scheduling applications.

The original use case that prompted me to explore this area was
tracking what time periods teachers were assigned to schools (a valid-
time table). Here's one way to represent this in PostgreSQL:

create table teachers__schools_1
(
teacher text not null
, school text not null
, from_date date not null
, to_date date not null
, check (from_date <= to_date)
, unique (teacher, school, from_date)
, unique (teacher, school, to_date)
);

Each row of this table represents the time range (from from_date to
to_date) during which a teacher was assigned to a particular school.
(Teachers can be assigned to more than one school at a time.) The
check constraint guarantees that [from_date, to_date] represents a
valid closed-closed interval (where the end points are included in
the range). Two unique constraints are necessary to guarantee
uniqueness for each row. In my use case, it would also be desirable
to apply constraints to prevent overlapping and ensure continuity--
that teachers are always at some school. This can be done using
constraint triggers, though making sure all of the comparisons for
four dates (beginning and end points for two ranges) are done
properly can be a little daunting and prone to bugs.

The problem of representing time ranges in a relational database has
been worked on for quite some time. Indeed, the PostgreSQL source
still includes a tinterval type, where the beginning and end points
are of type abstime, though this type is no longer included in the
documentation. Drafts of SQL3 included the PERIOD constructor to
represent date, time, and timestamp ranges as part of SQL/Temporal,
but this section of the specification was not included in the final
SQL:2003 standard. At least two relatively prominent books [1][2] as
well as numerous papers have been written on the subject.

An aside regarding terminology: In the literature I've read, time
ranges are most often referred to as intervals. However, SQL already
uses INTERVAL to refer to another distinct type. As mentioned above,
SQL3 used the term PERIOD for a constructor to create types with a
beginning point and an end point. RANGE is a reserved keyword in SQL:
2003 (related, I believe, to windowed tables). Range also has a
distinct meaning in relational calculus. I'm at a bit of a loss as to
how to refer to these structures with a beginning and an end point
with a term that isn't already reserved in SQL or may be in the
future. Suggestions welcome :) Span? Reuse tinterval? For the time
being, I'll arbitrarily continue to use range.

In the first six months of 2006 alone, I've noted quite a few threads
related to time ranges in the various PostgreSQL mailing lists, so it
seems this issue is ripe for a solution. Just two weeks ago, Albert
Hertroys started work on a vector type[3] , where he rightly notes
that time ranges are just an application of a general range (or
vector, to use his term) that could just as easily be used with
"integers, reals, points, dates, etc". For the past couple of months,
I've been working with composite types and PL/pgSQL to define and
manipulate date ranges and integer ranges, and see what can be
abstracted out to a general range type.

In the general case, a particular range value can be represented as
two point values. For example, the date range [2006-01-01,
2006-05-31] (using the closed-closed interval representation) is
represented by the two date "point" values 2006-01-01 and 2006-05-31.
The interval range [3,9] is represented by the two integer "point"
values 3 and 9. A range can be formed for any point type, where a
point type is any type that's well-ordered:
* the range of values is bounded (the number of values in the type
is finite)
* comparisons are well-defined for any two values, and
* for any point p, the next point can be found using a successor
function

Given a well-ordered "point" type, common questions of ranges can be
be answered, such as, for two ranges r1 and r2
* Do r1 and r2 overlap?
* Does r1 meet r2?
* Is the union of r1 and r2 defined, and if so, what is it?
* Is the intersection of r1 and r2 defined, and if so, what is it?

The way I've thought of implementing ranges in PostgreSQL is through
an additional system catalog table (called pg_ordered_type for
convenience) that would store the necessary information for each
point type:
* ordtype: the point type (referencing pg_type.oid)
* ordcmpfunc: the comparison function (referencing pg_proc.oid)
* ordsucc: the successor function (referencing pg_proc.oid)
* ordmin, ordmax the minimum and maximum values for the point type
(not sure how this would be represented)

From one point of view, the successor function and minimum and
maximum values (and perhaps the comparison function) are intrinsic
attributes of the type itself. However, I can see potential
usefulness using the same "base" type with different successor
functions or different maximum and minimum values. For example, to
represent a point type of even numbers, the base type could be
integer, and the successor function would "p + 2". (This raises the
question of how to restrict the values of the type to even numbers:
perhaps using domains would be a solution.) Another example would be
using timestamps of different precision: timestamp(6) has a different
successor function than timestamp(0).

It may be desirable to include in pg_ordered_type a "predecessor"
function (returning the point prior to a point p) and a duration
function (providing a more efficient way to return the number of
points included in the interval than iterating over the successor
function).

With comparison and the successor function, one can define the 13
predicates (often called Allen operators) describing the relationship
between any two range values: equals, before, meets, overlaps,
starts, during, finishes, and inverses of the last six.

Note that two of the built-in numeric types--real and double-
precision--are not well-ordered, and therefore ranges for real and
double-precision would not be defined. The primary reason for this is
that without a successor function, it's not possible to determine
whether or not two range values meet (determined by the meets Allen
operator). And without a definition for meets, it's not possible to
define continuity--to use the above example, that a teacher must
always be assigned to some school. In that respect, the
implementation I'm proposing here differs from the vector type
proposed by Albert Hertroys. Perhaps there's a way to have a "looser"
form of ranges that do not include a successor function, and for
those looser ranges, continuity constraints couldn't be defined.
However, the issues arising from the inexactness of real and double
precision calls into question the usefulness of functions that rely
on testing for equality (e.g., equals, meets, starts, finishes, and
inverses of the last three).

Another issue for a general range implementation is uniqueness.
Currently btree indexes are used to enforce uniqueness in PostgreSQL.
By somewhat arbitrarily defining a comparison result for the 13
possible predicates, I've been able to define an opclass for ranges
so btree indexes can be defined. Note that in defining an opclass,
comparision is already required, so perhaps the comparison function
could be stored someplace other than the pg_ordered_type table.

As for other indexes, I've looked at some of the GiST material and
think that it probably can be usefully applied to ranges: for example
the ip4r type[4], which defines ranges of IPv4 addresses, uses GiST
for indexing. However, GiST is currently well over my head, so this
is speculation on my part.

Returning to my original example, with a "date_range" type, the table
could be defined as:

create table teachers__schools_2
(
teacher text not null
, school text not null
, period date_range not null
, primary key (teacher, school, period)
);

The original check constraint is handled by the date_range type and
the two unique constraints are replaced by a single primary key
constraint. Constraints for overlapping and continuity are still
handled using constraint triggers, but are easier to implement using
functions available to compare ranges rather than handling beginning
and end points individually.

I believe ranges in PostgreSQL would be useful and satisfy a need
expressed in numerous mailing list threads. I hope to do my part in
their implementation. I look forward to your feedback.

Michael Glaesemann
grzm seespotcode net

[1] Richard T. Snodgrass, Developing Time-Oriented Database
Applications in SQL, Morgan Kaufmann, 2000. ISBN 1-55860-436-7. Out
of print, but available from the author as a PDF.
(http://www.cs.arizona.edu/people/rts/tdbbook.pdf)
[2] Chris Date, Hugh Darwen, Nikolas Lorentzos, Temporal Data and the
Relational Model, Morgan Kaufmann, 2002. ISBN 1-55860-855-9 (http://
www.amazon.com/gp/product/1558608559/)
[3] Vector type (Re: [GENERAL] challenging constraint situation - how
do I make it) (http://archives.postgresql.org/pgsql-general/2006-05/
msg01279.php)
[4] ip4r (http://pgfoundry.org/projects/ip4r/)

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ian Caulfield 2006-06-10 15:54:23 Re: Ranges for well-ordered types
Previous Message Marc G. Fournier 2006-06-10 13:52:58 Re: archive threads across months (was Re: [HACKERS]