TM format can mix encodings in to_char()

From: Juan José Santamaría Flecha <juanjo(dot)santamaria(at)gmail(dot)com>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: TM format can mix encodings in to_char()
Date: 2019-04-12 16:45:51
Message-ID: CAC+AXB22So5aZm2vZe+MChYXec7gWfr-n-SK-iO091R0P_1Tew@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hackers,

I will use as an example the code in the regression test
'collate.linux.utf8'.
There you can find:

SET lc_time TO 'tr_TR';
SELECT to_char(date '2010-04-01', 'DD TMMON YYYY');
to_char
-------------
01 NIS 2010
(1 row)

The problem is that the locale 'tr_TR' uses the encoding ISO-8859-9
(LATIN5),
while the test runs in UTF8. So the following code will raise an error:

SET lc_time TO 'tr_TR';
SELECT to_char(date '2010-02-01', 'DD TMMON YYYY');
ERROR: invalid byte sequence for encoding "UTF8": 0xde 0x75

The problem seems to be in the code touched in the attached patch.

Regards,

Juan Jose Santamaria Flecha

Attachment Content-Type Size
tm-format-mixes-encodings.patch application/octet-stream 3.7 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2019-04-12 16:46:56 Re: Attempt to consolidate reading of XLOG page
Previous Message Alvaro Herrera 2019-04-12 16:31:33 Re: Reducing the runtime of the core regression tests