Re: Allow to_date() and to_timestamp() to accept localized names

From: James Coleman <jtc331(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Juan José Santamaría Flecha <juanjo(dot)santamaria(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Arthur Zakirov <zaartur(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Allow to_date() and to_timestamp() to accept localized names
Date: 2020-03-08 21:43:42
Message-ID: CAAaqYe9JT2Yq-CO5yKiv3+HBEnye6abEL9zhS+DQ9QWWX98J-A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Mar 8, 2020 at 2:19 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> I wrote:
> > James Coleman <jtc331(at)gmail(dot)com> writes:
> >> I'm still interested in understanding why we're using the ISO locale
> >> instead of the utf8 one in a utf8-labeled test though.
>
> > We are not. My understanding of the rules about this is that the
> > active LC_CTYPE setting determines the encoding that libc uses,
> > period. The encoding suffix on the locale name only makes a
> > difference when LC_CTYPE is being specified (or derived from LANG or
> > LC_ALL), not any other LC_XXX setting --- although for consistency
> > they'll let you include it in any LC_XXX value.
>
> Oh wait --- I'm wrong about that. Looking at the code in pg_locale.c,
> what actually happens is that we get data in the codeset implied by
> the LC_TIME setting and then translate it to the database encoding
> (cf commit 7ad1cd31b). So if bare "tr_TR" is taken as implying
> iso-8859-9, which seems likely (it appears to work that way here,
> anyway) then this test is exercising the codeset translation path.
> We could change the test to say 'tr_TR.utf8' but that would give us
> less test coverage.
>

So just to confirm I understand, that implies that the issue is solely that
only the utf8 tr_TR set is installed by default on this machine, and the
iso-8859-9 set is a hard requirement (that is, the test is explicitly
testing a codepath that generates utf8 results from a non-utf8 source)?

If so, I'm going to try a bare Ubuntu install on a VM and see what locales
are installed by default for Turkish.

If in fact Ubuntu doesn't install this locale by default, then is this a
caveat we should add to developer docs somewhere? It seems odd to me that
I'd be the only one encountering it, but OTOH I would have thought this a
fairly vanilla install too...

James

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2020-03-08 21:52:37 Re: Improve search for missing parent downlinks in amcheck
Previous Message David Rowley 2020-03-08 21:27:26 Re: Index Skip Scan