Re: Timezones (in 8.5?)

From: hgonzalez(at)gmail(dot)com
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Timezones (in 8.5?)
Date: 2009-11-19 19:21:47
Message-ID: 000e0cd70ece3a08d90478be4580@google.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Nov 19, 2009 1:18am, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk> wrote:

> Right, but including more data in a single type is the wrong approach,
> since it complicates the semantics and interferes with normalization.
> For example, if you have a type T which incorporates a timestamp and a
> timezone, what semantics does the T = T operator have? What semantics
> apply if the definitions of timezones change?

I dont get the thing about normalization, there's complete ortoghonality in
my approach.
And when you say "complicates the semantic" I'd say "enrich the semantics"
(and even
clarify it) so that it fits more neatly to the typical usage of dates and
times in real life.
For "datetimes with tz" the equality (and comparison) operator is not
trivial, roughly
in the same sense that date-time arithmetic is not trivial when one stops
thinking of
datetimes as "physical time". So is life. Should the datetimes
'2010-07-27 9:30 Chile' and '2010-07-27 10:30 Argentine' (GMT+4 and GMT+3
respec)
be considered equal? It's arguable; but the ambiguity (just a matter of
adoption) reflects
reality. We can discuss it and adopt some consistent criteria.

> What if you're storing
> times of events at specific places; in that case you want to associate
> the timezone with the _place_ not the event (so that if the timezone
> rules change, moving the place from one timezone to another, you only
> have to change the place, not all the events that refer to it).

I'm not sure I undestand you here. I'm claiming that timezone rules
alterations
(zic files changes) should always be supported by the db implementation,
without
needing of touching your data. And I believe that timestamps (ie physical
times)
are in practice almost never associated to timezone information. If you
want to store
"the instant of last solar eclipse" you normally store the timestamp, a
timezone
might only be useful for displaying (or as an adittional info, not really
associated to the event)
A border case would be "store the instant of the death of John Lennon". You
might
store the TZ here if you are interested in the civil time (so you can
answer, for example,
¿how many rock stars died in morning/afternoon?). But then, again, you are
here actually
storing a civil date (local date-time plus TZ). The only problematic case i
can envision
is to intend to store a physical time in the future with TZ, but frankly it
is difficult to
think of this scenario (and even more difficult to think of needing to
operate with that
data as a whole; hence, in this case, to store the two fields separatadely
makes sense).

I'm being dense, and this might be a lost cause, but anyway, perhaps some
day in the future
this might be of some use:

I strongly believe that, if one could sample the real needings and usage of
date-time types in
applications in this world, and taking apart types DATE (very frequent, but
rather straightforward),
and TIME (not so relevant) and intervals (other issues here, much related
to datetimes), the
overwhelming majority would fall ( conceptually) into these three types:

- TIMESTAMP (physical time - no TZ - no civil time implied)
- LOCAL DATETIME (civil time, no TZ)
- DATETIME (civil time with TZ => togheter with zic tables, implies a
physical time)

And of these three -I'd bet- the first is (conceptually) the most common,
by a wide margin.

As the name TIMESTAMP implies, it frequently records the moment of a event
(in the DB corresponds
frequently to the creation or alteration of a record, frequently via
a "now()" default or such).
Examples: the timestamp of messages in a mailing list, or issues in a
bugtracker, or posts/articles in a blog/Cms.
Sometimes it is modifiable by the user. Sometimes it is displayed (as a
civil date, of course) according
to some TZ implied somewhere else. It's normal that users with differnt TZ
sees this event each
with its own TZ; and one is not directly interested on obtaining (say)
an "inherent" civil datetime for the
event (for example one is not interested in asking what posts where
generated at midnight
acording to the localtime of the user that created it).

The LOCAL DATETIME is only of use for civil date-times, when one is not
directly interested in
asociate events with real (physicial time) - this cannot be compared with a
real time (it cant trigger alarms, eg)
Or, more rarely, when the TZ is implied somehere else (in the application,
not it the DB server!).

The DATETIME is equivalent to the compound type {LOCAL_DATETIME,TZ}. Here
the "civil date-time" is again
the primary concept one deals with, but in a given place in the world (TZ),
so it implies also (with the assistance of a zic table)
a real time. This type is, IMHO, less frequent than the others. The typical
use is for calendars or schedulers.

One could, a propos Andrew's observation, consider a fourth type: TIMESTAMP
WITH TZ. But it seems overkill:
except for ZIC changes, the correspondence with DATETIME is univocal (BTW,
this is why in the Jodatime API
-which does not deal with persistence- this concepts are strictly
equivalent). Given this nearly-equivalence, and that
the needing of this type in real life is (IMO) almost null, I think that
DATETIME is the one to survive.

(One could even propose a fifht type: a TIMESTAMP WITH GMT OFFSET (roughly
the ANSI proposal), which
would be equivalent to have a timestamp AND a local datetime; this is more
easy to deal with than timezones,
but (as was discussed here before) is too limited (does not allow
artithmetic) and is not orthogonal with the real useful types.)

Regarding implementation:

TIMESTAMP is straightforward, more or less the same as today: stored as
UTC, can be
input/output in ISO 8601 format (the client/server can use the offset they
like, internally it's translated to GMT+0)

LOCAL_DATETIME also is straightforward, also stored as UTC (as in GMT+0) .
BUT
- input/output in ISO 8601 format should not allow/produce GMT offset
- the similarity of implementation should not leak upwards. This types are
incompatible, cannot be compared, etc

DATETIME is the difficult one, of course.
- Equivalent to the pair {LOCAL_DATETIME,TX_id} (occupies more space)
- Requires some catalog table or something akin to codify consistently the
timezones as numbers (included in pg_dump output?)
- Requires new definitions for input/output (and deal with some
ambiguities, particulary in DST transitions)
- Requires some semantic definitions (orderig, equality)
- Some arithmetics (which involve convertion to physic time) may be
expensive, might require some aggresive caching of time (zic) calculations.

Global considerations:
- Backward compatibility?
- SQL spec compatibility? (is worsened?) (deprecate TIMESTAMP WITH
TIMEZONE?)
- implement conversion functions - castings (how strict?)
- discuss/implement interval types/functions
- interfaces (JDBC...)

A bit of work, granted... (I might help)... but I bet that the most
difficult work, by far, is to
reach an agreement :-)

Thanks for reading.

Hernán J. González

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2009-11-19 19:33:47 Re: Question about ECPGset_noind_null() and ECPGis_noind_null()
Previous Message Boszormenyi Zoltan 2009-11-19 19:12:24 Re: Question about ECPGset_noind_null() and ECPGis_noind_null()