Tracking timezone abbreviation removals in the IANA tz database

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Tracking timezone abbreviation removals in the IANA tz database
Date: 2016-09-02 12:55:13
Message-ID: 6189.1472820913@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

It turns out that some of the zone abbreviations shown in the IANA
timezone database don't correspond to any real-world usage, but were
more or less made up by the timezone database crew. And of late,
those folk have started to remove these invented abbreviations in favor
of just using numeric offsets. For example, part of the 2016f update
did this:

# Asia/Novosibirsk covers:
# 54 RU-NVS Novosibirsk Oblast

+# From Stepan Golosunov (2016-05-30):
+# http://asozd2.duma.gov.ru/main.nsf/(Spravka)?OpenAgent&RN=1085784-6
+# moves Novosibirsk oblast from UTC+6 to UTC+7.
+# From Stepan Golosunov (2016-07-04):
+# The law was signed yesterday and published today on
+# http://publication.pravo.gov.ru/Document/View/0001201607040064
+
Zone Asia/Novosibirsk 5:31:40 - LMT 1919 Dec 14 6:00
- 6:00 - NOVT 1930 Jun 21 # Novosibirsk Time
- 7:00 Russia NOV%sT 1991 Mar 31 2:00s
- 6:00 Russia NOV%sT 1992 Jan 19 2:00s
- 7:00 Russia NOV%sT 1993 May 23 # say Shanks & P.
- 6:00 Russia NOV%sT 2011 Mar 27 2:00s
- 7:00 - NOVT 2014 Oct 26 2:00s
- 6:00 - NOVT
+ 6:00 - +06 1930 Jun 21
+ 7:00 Russia +07/+08 1991 Mar 31 2:00s
+ 6:00 Russia +06/+07 1992 Jan 19 2:00s
+ 7:00 Russia +07/+08 1993 May 23 # say Shanks & P.
+ 6:00 Russia +06/+07 2011 Mar 27 2:00s
+ 7:00 - +07 2014 Oct 26 2:00s
+ 6:00 - +06 2016 Jul 24 2:00s
+ 7:00 - +07

According to the Russia rules, %s expands to nothing for standard time or
"S" for summer time, so that the old entries here had the effect of
defining NOVT and NOVST as meaning variously UTC+6, UTC+7, or UTC+8
at different times in the past. Now those abbreviations are gone,
so far as IANA is concerned.

I had thought that this wouldn't really affect us because PG's
interpretations of zone abbreviations are driven by the info in
timezone/tznames/ rather than the IANA files in timezone/data/.
I forgot however that the "dynamic abbreviation" code relies on
being able to find matching abbreviations in the IANA data.
For example, timezone/tznames/Default lists

NOVST Asia/Novosibirsk # Novosibirsk Summer Time (obsolete)

but that zone abbreviation now fails entirely:

# select '2016-09-02 08:00:00 NOVST'::timestamptz;
ERROR: time zone abbreviation "novst" is not used in time zone "Asia/Novosibirsk"

(This is also the cause of bug #14307.)

So the question is what to do about it.

An easy answer is to just delete such entries from the timezone/tznames/
files altogether. If we believe the IANA crew's assumption that nobody
uses these abbreviations in the real world, then that seems like it
removes useless maintenance overhead for little cost. I'm a little
worried though that somebody might have followed IANA's lead and actually
started using these abbreviations, in which case we'd get complaints.

Another possibility is to keep these entries but get rid of the dynamic
zone aliases, reducing them to plain numeric UTC offsets. Probably the
value to use would be whatever was shown as the most recent active value
in the IANA list ... although that's open to interpretation. For
instance, according to the above we'd likely define NOVT as UTC+7,
but then logically NOVST ought to be UTC+8, which doesn't match up
with the fact that when it actually last appeared in the IANA data
it meant UTC+7. So that sounds like it'd be a bit of a mess involving
some judgment calls.

In the end, part of the reason we've got these files is so that users
can make their own decisions about abbreviation meanings. So the
ultimate answer to any complaints is going to be "if you think NOVT
means thus-and-such then put in an entry that says so".

So I'm leaning to the just-remove-it answer for any deleted abbreviation
that relies on a dynamic definition. Names that never had more than one
UTC offset can remain in the tznames list.

Comments?

BTW, we should be glad that the IANA crew decided to use ISO not POSIX
sign convention for these numeric zone abbreviations, or we'd really
have a mess on our hands :-(.

Also, the fact that this failure made it into releases shows that I'm
missing some now-necessary validation steps in the zone data update
procedure. Will add something about that to the timezone/README file.

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2016-09-02 13:01:58 Re: [PATCH] Transaction traceability - txid_status(bigint)
Previous Message Craig Ringer 2016-09-02 12:38:39 Re: [PATCH] Transaction traceability - txid_status(bigint)