Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)

From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Andres Freund <andres(at)anarazel(dot)de>, Christoph Berg <myon(at)debian(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)
Date: 2019-07-04 05:57:19
Message-ID: 87k1cypeym.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

>>>>> "Tom" == Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

Tom> I'm dubious that relying on zone[1970].tab would improve matters
Tom> substantially; it would fix some cases, but I don't think it would
Tom> fix all of them. Resolving all ambiguous zone-name choices is not
Tom> the charter of those files.

Allowing zone matching by _content_ (as we do) rather than by name does
not seem to be supported in any respect whatever by the upstream data;
we've always been basically on our own with that.

[tl/dr for what follows: my proposal reduces the number of discrepancies
from 91 (see previously posted list) to 16 or 7, none of which are new]

So here are the ambiguities that are not resolvable at all:

Africa/Abidjan -> GMT

This happens because the Africa/Abidjan zone is literally just GMT even
down to the abbreviation, and we don't want to guess Africa/Abidjan for
all GMT installs.

America/Argentina/Rio_Gallegos -> America/Argentina/Ushuaia
Asia/Kuala_Lumpur -> Asia/Singapore

These are cases where zone1970.tab, despite its name, includes
distinctly-named zones which are distinct only for times in the far past
(before 1920 or 1905 respectively). They are otherwise identical by
content. We therefore end up choosing arbitrarily.

In addition, the following collection of random islands have timezones
which lack local abbreviation names, recent offset changes, or DST, and
are therefore indistinguishable by content from fixed-offset zones like
Etc/GMT+2:

Etc/GMT-4 ==
Indian/Mahe
Indian/Reunion

Etc/GMT-7 == Indian/Christmas
Etc/GMT-9 == Pacific/Palau
Etc/GMT-10 == Pacific/Port_Moresby
Etc/GMT-11 == Pacific/Guadalcanal

Etc/GMT-12 ==
Pacific/Funafuti
Pacific/Tarawa
Pacific/Wake
Pacific/Wallis

Etc/GMT+10 == Pacific/Tahiti
Etc/GMT+9 == Pacific/Gambier

Etc/GMT+2 == Atlantic/South_Georgia

We currently map all of these to the Etc/GMT+x names on the grounds of
length. If we chose to prefer zone.tab names over Etc/* names for all of
these, we'd be ambiguous only for a handful of relatively small islands.

--
Andrew (irc:RhodiumToad)

In response to

Browse pgsql-committers by date

  From Date Subject
Next Message Michael Paquier 2019-07-04 07:12:48 pgsql: Introduce safer encoding and decoding routines for base64.c
Previous Message Michael Paquier 2019-07-04 02:35:06 pgsql: Simplify TAP tests of pg_dump for connection strings

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2019-07-04 06:24:13 Re: Replacing the EDH SKIP primes
Previous Message Kato, Sho 2019-07-04 04:40:44 RE: Run-time pruning for ModifyTable