Re: Range types

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: David Fetter <david(at)fetter(dot)org>
Cc: Scott Bailey <artacus(at)comcast(dot)net>, hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Range types
Date: 2009-12-15 22:02:07
Message-ID: 1260914527.13414.1977.camel@monkey-cat.sm.truviso.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 2009-12-15 at 11:49 -0800, David Fetter wrote:
> That sounds like a recipe for disaster. Whatever timestamp ranges
> are, float and int64 should be treated the same way so as not to get
> "surprises" due to implementation details.

It's a compile-time option that will change the semantics of timestamps
regardless. Anyone who changes between float and int64 timestamps may
experience problems for a number of different reasons. What was unique
before might no longer be unique, for instance.

> FWIW, I think it would be a good idea to treat timestamps as
> continuous in all cases.

I disagree. There is a lot of value in treating timestamp ranges as
discrete.

One big reason is that the ranges can be translated between the
different input/output forms, and there's a canonical form. As we know,
a huge amount of the value in an RDBMS is unifying data from multiple
applications with different conventions.

So, let's say one application uses (] and another uses [). If you are
mixing the data and returning it to the application, you want to be able
to provide the result according to its convention. You can't do that
with a continuous range.

And things get more interesting: if you mix (] and [), then range_union
will produce () and range_intersect will produce []. So now you have all
four conventions floating around the same database.

Another reason you might mix conventions: say you have log data from
several sources, and some sources provide timestamps for an event which
is essentially "instantaneous" and other sources will log the period of
time over which the event occurred, in [) format. To mix the data
coherently, the correct thing to do is call the instantaneous points a
singleton range; but the only way to represent a singleton continuous
range is by using [].

Whenever you mix conventions, you either have to be able to change the
format (which is only possible with discrete ranges) or teach the
application to understand your convention. And if you don't have a
canonical form (which is only possible with discrete ranges), you can't
reliably compare values for equality, or see if they are adjacent.

Saying that discrete ranges are unnecessary is essentially saying that
there's only a use for one convention; or that the conventions will
never be mixed; or that applications will always be smart enough to sort
it out for themselves.

Regards,
Jeff Davis

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Marko Kreen 2009-12-15 22:09:30 Re: Patch: Remove gcc dependency in definition of inline functions
Previous Message Scott Bailey 2009-12-15 22:01:24 Re: Range types