Re: Capitalization of localized month and day names (to_char() with 'TMmonth', 'TMday', etc.)

From: Magnus Holmgren <magnus(dot)holmgren(at)millnet(dot)se>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Capitalization of localized month and day names (to_char() with 'TMmonth', 'TMday', etc.)
Date: 2021-10-08 10:05:03
Message-ID: 2495775.TQQlMpQ45J@utklippan
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

onsdag 6 oktober 2021 kl. 16:01:49 CEST skrev du:
> =?UTF-8?Q?Juan_Jos=C3=A9_Santamar=C3=ADa_Flecha?=
<juanjo(dot)santamaria(at)gmail(dot)com> writes:
> > On Wed, Oct 6, 2021 at 11:09 AM Magnus Holmgren
> > <magnus(dot)holmgren(at)millnet(dot)se>>
> > wrote:
> >> There's just this tiny but seemingly obvious issue that I can't believe I
> >> haven't noticed until now: to_date(now(), 'TMmonth') returns 'october' in
> >> an
> >> English locale (en_US.UTF-8 at least). Names of months and weekdays are
> >> proper
> >> nouns and as such *always* capitalized in English, so that seems wrong to
> >> me.
> >
> > IMHO, the patterns of TO_CHAR() do as promised in the documentation [1]:
> > MONTH full upper case month name (blank-padded to 9 chars)
> > Month full capitalized month name (blank-padded to 9 chars)
> > month full lower case month name (blank-padded to 9 chars)
> >
> > What you are proposing looks more like a new feature than a bug.
>
> Yeah, this is operating as designed and documented. The idea that
> there should be a way to get "month name as it'd be spelled mid-sentence"
> is an interesting one, but I really doubt that anyone would thank us for
> changing TMmonth to act that way. (Perhaps a new format code or modifier
> would be easier to swallow?)

Yes, I see that it's working as designed and documented, but I contend that
the design is flawed for the reason I gave. I mean, you can't deny that names
of months and weekdays are always capitalized in English and certain other
languages, whereas in another set of languages they are not, can you? Perhaps
this is a conscious design choice with some reason behind it, but if so,
neither the PostgreSQL nor the Oracle documentation (https://docs.oracle.com/
cd/B12037_01/server.101/b10759/sql_elements004.htm#i34510) reveal it. What is
the use case for linguistically incorrectly lowercased localized month and day
names? What would such a change break?

I still suspect that whoever designed this didn't consider locale switching.
(Interestingly, "month", "mon", "day", and "dy" are locale-specific by
themselves; there is no "TM" prefix needed.

> I also wonder exactly how the code would figure out what to do ---
> language-specific conventions for this are not information available
> from the libc locale APIs, AFAIR.

I checked the code, and it looks like cache_locale_time() in src/backend/
utils/adt/pg_locale.c uses strftime(3) to produce the correctly capitalized
day and month names and abbreviations (format codes %A, %B, %a, and %b). All
that would be needed is not to force them to lowercase in DCH_to_char() in
src/backend/utils/adt/formatting.c.

What could a new, separate format code that doesn't do this look like?

--
Magnus Holmgren, developer
MILLNET AB

--
Vid e-postkontakt med Millnet är det normalt att åtminstone vissa
personuppgifter sparas om dig. Du kan läsa mer om vilka uppgifter som
sparas och hur vi hanterar dem på https://www.millnet.se/integritetspolicy/
<https://www.millnet.se/integritetspolicy/>.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2021-10-08 10:57:14 Reword docs of feature "Remove temporary files after backend crash"
Previous Message Aleksander Alekseev 2021-10-08 09:51:24 Re: Add ZSON extension to /contrib/