Re: Ranges for well-ordered types

From: Michael Glaesemann <grzm(at)seespotcode(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Ranges for well-ordered types
Date: 2006-06-14 06:47:16
Message-ID: F329B457-0572-49FC-BED7-47A81D2B3D5A@seespotcode.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks to everyone for the feedback that I've received so far. It's
clear that there's interest in this.

On Jun 12, 2006, at 3:22 , Josh Berkus wrote:

> I do think Jim is right, though, in that we may want to look for
> portions of
> the functionality which are achievable in the context of one
> PostgreSQL
> version, unless you're going to be working full-time on this patch.

I definitely agree with implementing it in parts. I doubt it's
possible, but perhaps a first bit might make it into 8.2 :)

> In real-world calendaring applications, I *certainly* see the need
> for a
> successor function. However, that would require being able to define
> timestamps with a variable precision, e.g. TIMESTAMP('5 minutes').
> This, by
> itself would be a significant effort, yet useful ... maybe that's
> where to
> start?

As mentioned in an earlier email, I think calendaring applications in
particular would benefit from timestamp precisions of less than 1
second, e.g., TIMESTAMP('5 minutes') or TIMESTAMP('1 hour'). However,
I think this is a thorny problem. To elaborate, I believe the
precision has to be relative to some "baseline". From 12:00, 30
minute precision would presumably allow 12:00, 12:30, 13:00, 13:30,
and so on. Precision of '1 hour' would allow 12:00, 13:00, 14:00, and
so on. But these are relative to the time zone they're in. While
12:00 in Tokyo (+9) would be a timestamp value with 1 hour precision,
that same timestamp is 4:30 in Tehran (+3:30) if I got the math
right. Is 4:30 a timestamp value with 1 hour precision? Because of
this, I think timestamp precisions of less than 1 second (timestamp
(0)) require storing the time zone as part of the timestamp value.

Pushing this even further, would we allow arbitrary precision? For
example, would 45-minute precision be allowed? In that case, I
believe we'd need to go further than storing just the time zone with
the timestamp value. The timestamp value would have to be relative to
some baseline timestamp to be able to calculate whether or not the
difference between any particular timestamp and the baseline
timestamp is integral. Perhaps this could be accomplished using
domains and some additional checking function? I'm not sure. It's
enough to make me want to forget about the idea of disallowing any
precision that is not an evenly divided into the next larger "time
part": any precision between 0 seconds and 1 minute would have to be
a number of seconds evenly divided into 60; between 1 hour and 1 day,
precision would have to be one of the values 1, 2, 3, 4, 6, 8, or 12
hours.

I've been able to discuss the issue of timestamp precision without
bringing up successor functions or ranges at all, and indeed I think
it's orthogonal to the range implementation. I think they're both
concepts that should be included in PostgreSQL, but as for myself,
I'm more interested in the range implementation than the the
timestamp precision issue.

By the way, anyone care to weigh in on what term we should use when
discussing this? Josh has used PERIOD. Should we go with that for now?

A somewhat related issue: would we want any implementation to follow
(at least part) of the not-yet-standard SQL/Temporal draft? Or would
it be more desirable to steer clear of using any terms/syntax that
was included in an attempt to prevent any possible conflict with a
future SQL spec?

> You're probably going to have to give up on B-Tree indexes for
> PERIODs, and
> look towards GiST. For one thing, I would see UNIQUE in the
> context of a
> PERIOD defined as non-overlapping. e.g.:

I think that a non-overlapping constraint goes above and beyond what
UNIQUE requires. In my opinion, UNIQUE should test for equality,
rather than non-overlapping, as that keeps the meaning of UNIQUE
consistent across all types and may actually be useful in some
instances. I do think it would be convenient to have some sort of
syntax that would provide a non-overlapping constraint rather than
having to code up a constraint trigger every time you wanted to do
this. As Martijn pointed out, when GiST can be used for a UNIQUE
constraint, we should be able to define the non-overlapping
constraint quite easily. So this could be thought of as a third
orthogonal issue for ranges, the first two being the range type
constructor and timestamp precision < 1 second. Any one of these
three could be done independently and improve PostgreSQL. In
combination they are definitely a very nice package.

On Jun 13, 2006, at 13:25 , Bruno Wolff III wrote:

> Date ranges are really closed open as well (as finite sets of
> isolated points
> are both open and closed). The only oddity would be that the date
> used to
> indicate the open end of the range might not be what the user expects.

I think it's definitely a matter of interpretation. [2006-01-01,
2006-12-31] and [2006-01-01, 2007-01-01) both include the same days.
Who's to say which is the "real" representation? For all practical
purposes (i.e., what can be represented within the database)
[2006-01-01 00:00:00+0, 2006-12-31 23:59:59] and [2006-01-01
00:00:00, 2007-01-01 00:00:00+0] represent the same timestamp(0) with
time zone ranges as well. While one might idealize time to be
continuous, as far as I know there isn't a way to represent time that
way in a computer, at the very least, not in PostgreSQL.

And for the very reason that it might not be what the user expects,
if there's a way to convert between closed-open and closed-closed as
appropriate, I think it makes it much more use friendly to do so. For
example, the closed-closed representation is equivalent to what
BETWEEN does. It would be very nice to be able to provide sometime
equivalent with ranges.

As for the successor function itself: Any "exact" datatype, such as
timestamp (at least with --enable-integer-datetimes), date, integer,
or numeric, has some built-in precision anyway and a successor
function follows quite directly from that precision. I don't see that
as problematic or even very difficult.

Thanks again for your comments, past, present and future! It's been
very helpful for me to hear from others on this.

Michael Glaesemann
grzm seespotcode net

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Martijn van Oosterhout 2006-06-14 08:23:45 Re: Fabian Pascal and RDBMS deficiencies in fully implementing
Previous Message Tom Lane 2006-06-14 03:43:32 Re: CSV mode option for pg_dump