Re: TM format can mix encodings in to_char()

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Juan José Santamaría Flecha <juanjo(dot)santamaria(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: TM format can mix encodings in to_char()
Date: 2019-04-22 19:16:03
Message-ID: 22661.1555960563@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Peter Geoghegan <pg(at)bowt(dot)ie> writes:
> On Sun, Apr 21, 2019 at 6:26 AM Andrew Dunstan
> <andrew(dot)dunstan(at)2ndquadrant(dot)com> wrote:
>> How does one do that? Just set a Turkish locale?

> tr_TR is, in a sense, special among locales:
> http://blog.thetaphi.de/2012/07/default-locales-default-charsets-and.html
> The Turkish dotless i has apparently been implicated in all kinds of
> bugs in quite a variety of contexts.

Yeah, we've had our share of those :-(. But the dotless i is not the
problem here --- it happens to not trigger an encoding conversion
issue, it seems. Amusingly, the existing test case for lc_time = tr_TR
in collate.linux.utf8.sql is specifically coded to check what happens
with dotted/dotless i, and yet it manages to not trip over this problem.
(I suspect the reason is that what comes out of strftime is "Nis" which
is ASCII, and the non-ASCII characters only arise from subsequent case
conversion within PG.)

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-04-22 19:24:02 Re: pg_dump is broken for partition tablespaces
Previous Message Alvaro Herrera 2019-04-22 19:13:22 Re: pg_dump is broken for partition tablespaces