Re: to_timestamp() and timestamp without time zone

From: hernan gonzalez <hgonzalez(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: to_timestamp() and timestamp without time zone
Date: 2011-06-24 17:37:43
Message-ID: BANLkTinoZpzHxd9tMAJFmnZfmUk05LV+5A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

> As I understand it, documentation patches are welcomed:)

I'd indeed wish some radical changes to the documentation.

To start with, the fundamental data type names are rather misleading; SQL
standard sucks here, true, but Postgresql also has its idiosincracies, and
the docs do not help much:

http://www.postgresql.org/docs/9.0/interactive/datatype-datetime.html

The most confusing thing is that PG's "TIMESTAMP WITH TIMEZONE" is NOT... a
timestamp with timezone! (not even in the crippled sense of the SQL
standard, which thinks of a "timestamp with offset"). It actually has no
relation with timezones. It's actually a plain "timestamp", as the world is
commonly used in computing, the "INSTANT" of time in which (typically) an
event happened. It's a physical concept, not related with civil things (even
with calendars). Typical example: the "modification time" of a file in a
unix filesystem: here the timestamp is stored internally as a Unix integer
time (seconds, relative to the Unix epoch), and when doing a "ls" the OS
displays it using the current timezone; but it's very clear that changing
the timezone only changes the output representation. The same happens in
Postgresql. A stored "TIMESTAMP WITH TIMEZONE" will be the same physical
instant (say, the instant for the first plane crash at S/11) regardless of
the timezone that is used (ONLY!) for output/input representation:
'2011-09-11 08:46:40-04' and '2011-09-11 09:46:40-03' are the same INSTANT.

db=# set TIMEZONE='US/Eastern';
db=# select inst from test_dates where rid=5;
2011-09-11 08:46:40-04
db=# set TIMEZONE='America/Argentina/Buenos_Aires';
db=# select inst from test_dates where rid=5;
2011-09-11 09:46:40-03

This is the type that should normally be used to record the time at which an
event happened (typically a record modification - like MYSQL uses the world
"TIMESTAMP").

On the other hand, "TIMESTAMP WITHOUT TIMEZONE" is a wholy different concept
(neither 'wider' or narrow' type than the other). It's just the local
calendar time, it's (conceptually) like a tuple of
numbers{year,month,day,hour,min,sec, fracsecs}, the "date and time" as it's
understood in the business-civil world, with the timezone information
missing.
This is the type that should be used for that concept, when it's critical
for me that 'If I stored "2011,09,11 at 23:59:59", I want to get that
precise calendar date, no matter what the server timezone is, no matter it
changes, or the DST rules changes, or have bugs or anything': timezones
should not be involved AT ALL when manipulating them(parsing, outputing,
storing, retrieving, comparing, extracting date or time, etc). Of course,
this is internally implemented (for mere convenience of space and
arithmetic) using a UTC timestamp (but this should not leak - from the user
point of view, everything should be as if the tuple of numbers was stored).

To resume:

"TIMESTAMP WITH TIMEZONE" is an INSTANT
"TIMESTAMP WITHOUT TIMEZONE" is a LOCALDATETIME

(BTW, this mirrors the Joda-Time JAVA API terminology
http://joda-time.sourceforge.net/ )

Neither of them has or implies a TIMEZONE. The first can accept/use one TZ
for input/ouput representation, that's all.

Docs should make clear this, and the fact that Postgresql currently lacks a
"FULL" datetime type. This would correspond conceptually to the tuple
{INSTANT,TIMEZONE} (which implies univocally also a LOCALDATETIME). Nor does
it implemented the SQL standard "TIMESTAMP WITH TIMEZONE" (actually a
{INSTANT,OFFSET} tuple). I'm ok with that, I agree that the later is
crippled, and the former is complex and not standard, and that both have
little support from upper layers.

The problem, for me, is that Postgresql does not emphasize the conceptual
difference between this types, and that it tends too easily IMO to "assume"
the missing information (the timezone) even when probably the user does not
want to make that assumption. For example, I hope someday PG disables the
implicit casting between these two types.

In general, I feel that sometimes Postgresql pretends to know too much about
the data. Say, when I store a local datetime (TIMESTAMP WITHOUT TIMEZONE), I
wish to tell him: "this a LOCAL datetime, please don't pretend EVER to know
its timezone, you don't need it; if I asked you to select all LOCAL DATE
TIMES before "2001/2/3 0:0:0", if you for doing that are invoking some
timezone convertion or intelligence, you are messsing with my data - stop
doing that".

You can see in this SO question some (non PG specific) discussion about
datetimes and DBs with potentially many timezones.
http://stackoverflow.com/questions/2532729/daylight-saving-time-and-timezone-best-practices/2533378#2533378
The most upvoted recipe is: "Persist globally, display locally". I.e., don't
pretend to store the full date time info, only the instant, let the upper
layers fill the tz info (perhaps from other data, the user profile, or
whatever) and do the display. I don't endorse this fully (sometimes the DB
must know, eg for doing comparisons and arithmetic), but sometimes I feel
that Postgresql lacks this perspective, and its mindset lies in the other
extreme: "I know everything (timezones), I do everything (display)".
When I read the PG docs I'm frequently uncomfortable to see many
explanations of fundamentals tied to "how PG parses/displays" the data. See
the docs for the TIMESTAMP types, where the difference between them is dealt
only in this context. As most developers, I'm using some client interface
(say, JDBC), why should I care about how PG converts data from/to human
readable form? I'm not asking it to do that, I tend to think, I'm just using
it for the persisting layer. Only gradually one (I) graps that
human-readable strings are at the core of PG data handling, and that client
interfaces must deal with that. It would be ludicrous for me to critize
that, all that has surely some deep justification. But anyway, from the
point of view of the app developer, it sounds strange sometimes.

BTW, I was curious to see how the JDBC client code dealt wit this, if he was
confortable and if all was clear and foolproof with these TIMESTAMP types at
that level. Well, not very - it seems. See http://goo.gl/L2Pzi , search for
"compromise". Probably JDBC is also to blame here, but anyway...

Regards

Hernan J Gonzalez

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Hiroshi Saito 2011-06-24 17:45:04 Re: UUID-OSP contrib module
Previous Message Rob Sargent 2011-06-24 17:22:42 glitch installing xml support in 9.1.beta2