Re: Timezones (in 8.5?)

From: hernan gonzalez <hgonzalez(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Timezones (in 8.5?)
Date: 2009-11-17 20:31:54
Message-ID: 48692c2d0911171231h6ab16a64yc4db35a6e26909e0@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> Perhaps the OP should explain exactly what real-world problems he's
> trying to solve.  As noted in the discussion you linked, there's not
> a lot of enthusiasm around here for getting closer to the spec's
> datetime handling simply because it's the spec; that part of the spec
> is just too broken for that to be a credible argument.

I'm not much interested in the compliance with the ANSI SQL spec, I
agree in this regard it is unsatisfactory (to put it midly).
But I'm also disatisfied with the current Postgresql implementation,
the types TIMESTAMP and TIMESTAMP WITH TIMEZONE are in the middle of
being SQL compliant and being really useful. The support of timezones
is really crippled now.

I understand, though, that backward compatibily is critical, and I'm
surely unaware of many implementation issues.
Anyway (long rambling follows - and excuse my english)...

We know that, even ignoring ANSI spec and postgresql compatibility for
one moment, even before considering date-time arithmetic and DTS
issues, date-time handling is notoriously difficult to formalize
satisfactorily. And, come to look at it, it's not a Postgresql
problem, nor a SQL problem: I believe there is NO standard for
store/serialize/represent a "date-time value", with all the
complexities that the concept has in human usage (ISO 8601, as
ANSI-SQL, just considers GMT offsets, not real timezones).

Let me present a simple real world scenario -to look at not from the
implementation point of view, but from the user:

- John records in his calendar a reminder for some event at datetime
2010-Jul-27, 10:30:00, with TZ "Chile/Santiago", (GMT+4 hence it
corresponds to UTC time 2010-Jul-27 14:30:00). But some days
afterwards, his government decides to change the country TZ to GMT+5.

Now, when the day comes... should that reminder trigger at
A) 2010-Jul-27 10:30:00 "Chile/Santiago" = UTC time 2009-Jul-27 15:30:00
or
B) 2010-Jul-27 9:30:00 "Chile/Santiago" = UTC time 2009-Jul-27 14:30:00 ?

There is no correct answer, unless one knows what John actually meant
when he said "please ring me at "2010-Jul-27, 10:30:00
TZ=Chile/Santiago"
Did he mean a "civil date-time" ("when the clocks in my city tell
10:30")? In that case, A) is the correct answer.
Or did he mean a "physical instant of time", a point in the continuus
line of time of our universe, say, "when the next solar eclipse
happens". In that case, answer B) is the correct one.

I believe that this distinction between two realms: one related to
(say) "physical time" and the other to (say) "civil date-time", is the
key to put some order... conceptually, at least (I'm not speaking
about feasibility for now). This is the approach of some Date-Time
APIs, for example the "Joda" Java library
http://joda-time.sourceforge.net/ (headed to replace soon
https://jsr-310.dev.java.net/ the original ugly JDK Date-Calendar
API) and I believe it's the right way.

In this approach, we would have two entirely different types (or
family of types) -no castings allowed.
An "instant" is a "physical time", a point in the time continuum.
A "partial date time spec" (or "partial civil datetime") is just a
tuple of values {year,month,day, hour,min,sec,usec,TZ_id} some of
which might be empty/unspecified.
Conversion from "instant" to "civil datetime" is only allowed if a TZ
is also specified (well, also a "Calendar" spec, if non-gregorian
dates are to be dealt with).
Conversion from "partial civil datetime" to "instant" is only allowed
if all fields are non-empty (again, assuming a "Calendar").
Similar distintion goes for "intervals" or "durations".

Postgresql implementation (and ANSI-SQL), for all date-time data,
revolves around the "physical time" concept: that is what it is
ultimately stored, that's what it's tought as the "real thing" (the
rest are input/output and arithmetic issues).
(Rather disgressing: even the DATE type is treated as a point in time,
as a DateTime with time=00:00:00 ; I think this is bad, conceptually,
when I think of "2010-Jul-27" I think of a date, not of the instant of
time "2010-Jul-27 00:00:00", they are different concepts; this is NOT
analogous to INT 10 => FLOAT 10.0 )
Because of this (IMHO) conceptual limitation, the availabily of the
two types "TIMESTAMP" "TIMESTAMP WITH TIME ZONE" results,
unfortunately, much less useful than it could have been.

If I were to reimplement the date-time data types, without much
regarding ANSI-SQL standard and Postgresql compatibility (a little too
much to ask, I know) I'd propose:

TIMESTAMP: ("instante") just a point in time, purely physical (as it
name suggest!). UTC encoded.
(input format could accept unix time or standard datetime format, with
default/server TZ; output format could output explicit GMT offset, to
support dump/restore robustly)

DATETIME: (call it "TIMESTAMP WITH TIME ZONE" if you wish but... is a
very different thing)
a full "civil" date time specification {year,month,day,
hour,min,sec,usec,TZ} (Of course, internally it could be stored as UTC
+ TZ_id )
Can be converted to TIMESTAMP, (but no casting allowed!), but the
result might vary if the TZ tables are changed (see my example above).

LOCAL_DATETIME: a full "civil" date time specification
{year,month,day, hour,min,sec,usec} with NO TZ.
Cannot be converted to DATETIME (or TIMESTAMP), except if a TZ_id is
also specified.
(This type might be merged with DATETIME if we allow empty TZ_id
values; but NEVER assume the server TZ as default TZ when empty!)

DATE: just a civil "date" {year,month,date} . Can be converted to
LOCAL_DATIME only by adding the remaining fields.

And similar for intervals...

Of course, there are many implementation details (some "time zone"
type or codification - efficient caching of DATETIME
operations/conversions - lots of input/output formatting issues,
interfaces, dump/restore) - and above all, compatibily with pg and
ANSI. Anyway, I dream of seing Postgresql going this way :-)

Best regards

Hernán J. González
Buenos Aires, Argentina
http://hjg.com.ar/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kevin Grittner 2009-11-17 20:33:08 Re: plpgsql: open for execute - add USING clause
Previous Message Pavel Stehule 2009-11-17 20:27:38 Re: plpgsql: open for execute - add USING clause