Re: storing TZ along timestamps

From: Christopher Browne <cbbrowne(at)gmail(dot)com>
To: Jim Nasby <jim(at)nasby(dot)net>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, "alvherre(at)alvh(dot)no-ip(dot)org Herrera" <alvherre(at)alvh(dot)no-ip(dot)org>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: storing TZ along timestamps
Date: 2011-07-21 22:30:48
Message-ID: CAFNqd5W5smcK-TjN0BNWtxk2PyK4=1kDa5GbJT44cLZOfpmQ4A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jul 21, 2011 at 5:48 PM, Jim Nasby <jim(at)nasby(dot)net> wrote:
> On Jul 19, 2011, at 4:06 PM, Josh Berkus wrote:
>>> I have my doubts about that, and I hope not.  These details haven't been
>>> discussed at all; I only started this thread to get community approval
>>> on cataloguing the TZs.
>>
>> I am strongly in favor of having a *timezone* data type and some system
>> whereby we can uniquely identify timezones in the Zic database.  That
>> would be tremendously useful for all sorts of things.  I'm just
>> asserting that those who want a composite timestamp+saved-time-zone data
>> type have not thought about all of the complications involved.
>
> Having to deal with timezone's completely separate from their timestamps is a huge PITA. That said, if we had a timezone datatype there's at least the possibility of using a composite type to deal with all of this. Or at least we can just create a custom datatype using existing tools... the only part of this that I see that actually requires closer core support is the timezone data itself.
>
> So if the community is OK with adding a timezone datatype then we can focus on that and leave the timestamptztz data type as an add-on (at least assuming we don't run into any gotchas).

As I have been watching this whole thread, my inclination has been to
look at this from a "Prolog" perspective, where we think about the
database as indicating a series of assertions about facts, from which
we then try to reason.

I suspect that determining what *really* needs to get recorded depends
on this. And it seems to me that trying to head down the path of
defining oid-based lookups of timezone names may be putting the cart
before the horse.

There are a number of facts about a timestamp:

1. What time did the database server think it was?

SELECT NOW();

captures the database's concept of what time it was, complete with:
a) The time, based, I think, on UT1. With the caveat that there's no
certainty that the DB server's time is necessarily correct.
b) An encoding of the timezone offset based on the value of the
TimeZone GUC for this connection.

If one is running an NTP daemon, pointing to a decently-connected
network of NTP servers, then it's likely that this time is pretty
accurate. And most of the time, I'd be inclined to treat this as
authoritative, and contend that anything else is likely to be less
correct and less easy to work with.

The goal of this discussion thread is to record another timestamp with
a different basis. It's not entirely clear what is its basis. I'll
suggest one, which mostly underlines my contention that it's likely
"less correct" and "less easy to work with" than having a column
defined as...
some_timestamp timestamp with timezone default NOW()

2. Client-based timestamp, comprising two things:

a) A time, ascertained by the client.
b) A timezone, ascertained by the client.

Note that timezones are pretty open-ended. There is an authoritative
encoding defined in the tz database, but there are other values used
out there. We had to patch Slony-I to have it use 'ISO' timestamps,
and recommend running in GMT/UTC, because there are values that blow
things up.

For instance, on AIX, there is a habit for boxes to set TZ=CUT0, out
of the box, which isn't on what PostgreSQL considers to be the
"official list."

On the more whimsical side of things, Joey Hess, a Debian developer
noted for such things as ikiwiki, etckeeper, git-annex, decided to
create his very own custom timezone, "JEST", because he was irritated
about DST.

http://kitenet.net/~joey/blog/entry/howto_create_your_own_time_zone/
http://kitenet.net/~joey/blog/entry/JEST_results/

That "whimsical" entry won't be going into tzdata, and while we could
discount this case as whimsy, it's not "out there" for organizations
such as nation states to decide to legislate their own things, that we
can't be certain will necessarily get into tzdata.

There are enough aliases and possibilities of local national decisions
to make it at least somewhat troublesome to treat this as something
that can be considered fixed down to the OID level.

My conclusion would be that if someone is really, really, really keen
on capturing their own notion of timezone, then this fits with the
notion that, if they want to have something that could be treated as
remotely authoritative, they should capture a multiplicity of pieces
of datestamp information, and actively accept that this will be pretty
duplicative.

- I'd commend capturing NOW() in a timestamptz field. That gives you:
1. What time the DB server thought it was, in terms of UT1
2. What timezone it thought was tied to that connection.
- Also, I'd be inclined to capture, in plain text form:
3. A client-recorded timestamp. I'm agnostic as to whether this has
*any* validation done on it; I'd think it plausible that this is
simply a text field, that might require a human to interpret it.
4. A client-recorded timezone. This would be a plain text field, and
I'm not certain it's of any particular value to try to validate it
against any would-be authoritative list. Why shouldn't Joey be able
to use JEST?
--
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2011-07-21 22:43:29 Re: sinval synchronization considered harmful
Previous Message Florian Pflug 2011-07-21 22:22:09 Re: sinval synchronization considered harmful