Re: Calendar support in localization

From: "Daniel Verite" <daniel(at)manitou-mail(dot)org>
To: "Thomas Munro" <thomas(dot)munro(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Surafel Temesgen <surafel3000(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Vik Fearing <vik(dot)fearing(at)enterprisedb(dot)com>
Subject: Re: Calendar support in localization
Date: 2021-03-26 17:51:33
Message-ID: af152109-a986-46ef-a8ad-4f3a14718ee9@manitou-mail.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thomas Munro wrote:

> Right, so if this is done by trying to extend Daniel Verite's icu_ext
> extension (link given earlier) and Robert's idea of a fast-castable
> type, I suppose you might want now()::icu_date + '1 month'::internal
> to advance you by one Ethiopic month if you have done SET
> icu_ext.ICU_LC_TIME = 'am_ET(at)calendar=traditional'.

I've pushed a calendar branch on icu_ext [1] with preliminary support
for non-gregorian calendars through ICU, so far with only format and parse
of timetamptz.
The ICU locale drives both the localization of field names (language) and the
choice of calendar.

It looks like this:

\set fmt 'dd/MMMM/yyyy GGGG HH:mm:ss.SSS zz'

WITH list(cal) AS ( values
('gregorian'),
('japanese'),
('buddhist'),
('roc'),
('persian'),
('islamic-civil'),
('islamic'),
('hebrew'),
('chinese'),
('indian'),
('coptic'),
('ethiopic'),
('ethiopic-amete-alem'),
('iso8601'),
('dangi')
),
fmt AS (
select
cal,
icu_format_date(now(), :'fmt', 'fr(at)calendar='||cal) as now_str,
icu_format_date(now()+'1 month'::interval, :'fmt', 'fr(at)calendar='||cal) as
plus_1m
from list
)
SELECT
cal,
now_str,
icu_parse_date(now_str, :'fmt', 'fr(at)calendar='||cal) as now_parsed,
plus_1m,
icu_parse_date(plus_1m, :'fmt', 'fr(at)calendar='||cal) as plus_1m_parsed
FROM fmt;

-[ RECORD 1 ]--+-------------------------------------------------------
cal | gregorian
now_str | 26/mars/2021 après Jésus-Christ 18:22:07.566 UTC+1
now_parsed | 2021-03-26 18:22:07.566+01
plus_1m | 26/avril/2021 après Jésus-Christ 18:22:07.566 UTC+2
plus_1m_parsed | 2021-04-26 18:22:07.566+02
-[ RECORD 2 ]--+-------------------------------------------------------
cal | japanese
now_str | 26/mars/0033 Heisei 18:22:07.566 UTC+1
now_parsed | 2021-03-26 18:22:07.566+01
plus_1m | 26/avril/0033 Heisei 18:22:07.566 UTC+2
plus_1m_parsed | 2021-04-26 18:22:07.566+02
-[ RECORD 3 ]--+-------------------------------------------------------
cal | buddhist
now_str | 26/mars/2564 ère bouddhique 18:22:07.566 UTC+1
now_parsed | 2021-03-26 18:22:07.566+01
plus_1m | 26/avril/2564 ère bouddhique 18:22:07.566 UTC+2
plus_1m_parsed | 2021-04-26 18:22:07.566+02
-[ RECORD 4 ]--+-------------------------------------------------------
cal | roc
now_str | 26/mars/0110 RdC 18:22:07.566 UTC+1
now_parsed | 2021-03-26 18:22:07.566+01
plus_1m | 26/avril/0110 RdC 18:22:07.566 UTC+2
plus_1m_parsed | 2021-04-26 18:22:07.566+02
-[ RECORD 5 ]--+-------------------------------------------------------
cal | persian
now_str | 06/farvardin/1400 Anno Persico 18:22:07.566 UTC+1
now_parsed | 2021-03-26 18:22:07.566+01
plus_1m | 06/ordibehešt/1400 Anno Persico 18:22:07.566 UTC+2
plus_1m_parsed | 2021-04-26 18:22:07.566+02
-[ RECORD 6 ]--+-------------------------------------------------------
cal | islamic-civil
now_str | 12/chaabane/1442 ère de l’Hégire 18:22:07.566 UTC+1
now_parsed | 2021-03-26 18:22:07.566+01
plus_1m | 14/ramadan/1442 ère de l’Hégire 18:22:07.566 UTC+2
plus_1m_parsed | 2021-04-26 18:22:07.566+02
-[ RECORD 7 ]--+-------------------------------------------------------
cal | islamic
now_str | 13/chaabane/1442 ère de l’Hégire 18:22:07.566 UTC+1
now_parsed | 2021-03-26 18:22:07.566+01
plus_1m | 14/ramadan/1442 ère de l’Hégire 18:22:07.566 UTC+2
plus_1m_parsed | 2021-04-26 18:22:07.566+02
-[ RECORD 8 ]--+-------------------------------------------------------
cal | hebrew
now_str | 13/nissan/5781 Anno Mundi 18:22:07.566 UTC+1
now_parsed | 2021-03-26 18:22:07.566+01
plus_1m | 14/iyar/5781 Anno Mundi 18:22:07.566 UTC+2
plus_1m_parsed | 2021-04-26 18:22:07.566+02
-[ RECORD 9 ]--+-------------------------------------------------------
cal | chinese
now_str | 14/èryuè/0038 78 18:22:07.566 UTC+1
now_parsed | 2021-03-26 18:22:07.566+01
plus_1m | 15/sānyuè/0038 78 18:22:07.566 UTC+2
plus_1m_parsed | 2021-04-26 18:22:07.566+02
-[ RECORD 10 ]-+-------------------------------------------------------
cal | indian
now_str | 05/chaitra/1943 ère Saka 18:22:07.566 UTC+1
now_parsed | 2021-03-26 18:22:07.566+01
plus_1m | 06/vaishākh/1943 ère Saka 18:22:07.566 UTC+2
plus_1m_parsed | 2021-04-26 18:22:07.566+02
-[ RECORD 11 ]-+-------------------------------------------------------
cal | coptic
now_str | 17/barmahât/1737 après Dioclétien 18:22:07.566 UTC+1
now_parsed | 2021-03-26 18:22:07.566+01
plus_1m | 18/barmoudah/1737 après Dioclétien 18:22:07.566 UTC+2
plus_1m_parsed | 2021-04-26 18:22:07.566+02
-[ RECORD 12 ]-+-------------------------------------------------------
cal | ethiopic
now_str | 17/mägabit/2013 après l’Incarnation 18:22:07.566 UTC+1
now_parsed | 2021-03-26 18:22:07.566+01
plus_1m | 18/miyazya/2013 après l’Incarnation 18:22:07.566 UTC+2
plus_1m_parsed | 2021-04-26 18:22:07.566+02
-[ RECORD 13 ]-+-------------------------------------------------------
cal | ethiopic-amete-alem
now_str | 17/mägabit/7513 ERA0 18:22:07.566 UTC+1
now_parsed | 2021-03-26 18:22:07.566+01
plus_1m | 18/miyazya/7513 ERA0 18:22:07.566 UTC+2
plus_1m_parsed | 2021-04-26 18:22:07.566+02
-[ RECORD 14 ]-+-------------------------------------------------------
cal | iso8601
now_str | 26/mars/2021 après Jésus-Christ 18:22:07.566 UTC+1
now_parsed | 2021-03-26 18:22:07.566+01
plus_1m | 26/avril/2021 après Jésus-Christ 18:22:07.566 UTC+2
plus_1m_parsed | 2021-04-26 18:22:07.566+02
-[ RECORD 15 ]-+-------------------------------------------------------
cal | dangi
now_str | 14/èryuè/0038 78 18:22:07.566 UTC+1
now_parsed | 2021-03-26 18:22:07.566+01
plus_1m | 15/sānyuè/0038 78 18:22:07.566 UTC+2
plus_1m_parsed | 2021-04-26 18:22:07.566+02

I understand that adding months or years with some of the non-gregorian
calendars should lead to different points in time than with the gregorian
calendar.

For instance with the ethiopic calendar, the query above displays today as
17/mägabit/2013 and 1 month from now as 18/miyazya/2013,
while the correct result is probably 17/miyazya/2013 (?)

I'm not sure at this point that there should be a new set of
data/interval/timestamp types though, especially if considering
the integration in core.

About intervals, if there were locale-aware functions like
add_interval(timestamptz, interval [, locale]) returns timestamptz
or
sub_timestamp(timestamptz, timestamptz [,locale]) returns interval
that would use ICU to compute the results according to the locale,
wouldn't it be good enough?

Another argument for new datatypes could be that getting the
localized-by-ICU display/parsing without function calls around the dates
means new I/O functions. In the context of the extension, probably,
but in core, if DateStyle is extended to divert the I/O of date/timestamp[tz]
to ICU, I guess it could work with the existing types.

Another reason to have new datatypes could be that users would like
to use a localized calendar only on specific fields. I don't know if that
makes sense.

[1] https://github.com/dverite/icu_ext/tree/calendar

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Finnerty, Jim 2021-03-26 18:02:31 Re: Challenges preventing us moving to 64 bit transaction id (XID)?
Previous Message Magnus Hagander 2021-03-26 17:20:34 Re: invalid data in file backup_label problem on windows