Re: Tracking timezone abbreviation removals in the IANA tz database

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Tracking timezone abbreviation removals in the IANA tz database
Date: 2016-09-02 13:56:50
Message-ID: 9104.1472824610@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> So I'm leaning to the just-remove-it answer for any deleted abbreviation
> that relies on a dynamic definition. Names that never had more than one
> UTC offset can remain in the tznames list.

After a bit more thought and consumption of caffeine, I am thinking that
that won't really be good enough. It's clear that the IANA crowd intend
to continue removing made-up abbreviations, in fact there's a pile of
that in their queue right now:
http://mm.icann.org/pipermail/tz/2016-August/023941.html

The trouble from our perspective is that a lot of those abbreviations have
shifted meaning over the years (if they had any real-world usage maybe
they'd have stayed more stable...), which means that we are using dynamic
abbreviations for them, which means breakage as soon as the tznames list
diverges from the IANA database. Which would be manageable if we only
shipped those filesets together, but in installations built with
--with-system-tzdata (which at least ought to include most vendor
distributions of PG), they don't come from the same place. This means
that even if we fix removals when we make new releases, other
abbreviations may break the next time the vendor updates their tzdata
package.

We could maybe tolerate individual abbreviations failing like this, but
as noted in bug #14307, the pg_timezone_abbrevs view fails altogether if
there are any broken abbreviations in the active timezone_abbreviations
list. So that makes this a problem for all users whether or not they care
about the specific abbreviations affected.

This leads me to think that we need to redefine the dynamic abbreviations
feature so it's a bit more robust in the face of this type of situation.

A really simple change (at least logically, haven't looked at the code)
would be to say that a dynamic abbreviation is just a macro for the
referenced zone name, that is if we have

NOVT Asia/Novosibirsk

in the timezone_abbreviations list then writing NOVT in a timezone value
is exactly equivalent to writing Asia/Novosibirsk. However, that breaks
backwards compatibility (at least for non-broken abbreviations). Our
convention up to now has been that if you write a standard-time zone
abbreviation then what it means is your local standard-time UTC offset,
even if DST is currently prevailing in your zone. For example, DST is
currently in force in the USA, so writing "America/New_York" means UTC-4,
but "EST" means UTC-5 regardless. Likewise "EDT" means UTC-4 and will
still mean that when winter comes.

So the idea I'm toying with (again, haven't tried to code this) is to say
that *if* we can match the abbreviation to something in the referenced
zone then we'll use that, but otherwise we fall back to treating the
abbreviation as a macro for the zone name. This would ensure that updates
to the IANA data could not break existing timezone_abbreviations entries
(at least, not unless IANA were to remove a zone name altogether, but they
have never done that AFAIR). An update could cause an abbreviation's
meaning to change, but that's true already, in fact it's kind of the
whole point.

If we were to do that, then perhaps we would not need to remove existing
timezone_abbreviations entries even if IANA deems them obsolete. I'd
still be inclined to remove them from our sample data files, but users
would easily be able to put them back if they wanted.

Thoughts?

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2016-09-02 13:59:23 Re: Password identifiers, protocol aging and SCRAM protocol
Previous Message Craig Ringer 2016-09-02 13:46:19 Re: [PATCH] Transaction traceability - txid_status(bigint)