Re: TM format can mix encodings in to_char()

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Juan José Santamaría Flecha <juanjo(dot)santamaria(at)gmail(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: TM format can mix encodings in to_char()
Date: 2019-04-20 15:50:01
Message-ID: 24472.1555775401@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> Hmm. I'd always imagined that the way that libc works is that LC_CTYPE
> determines the encoding (codeset) it's using across the board, so that
> functions like strftime would deliver data in that encoding.
> [ and much more based on that ]

After further study of the code, the situation seems less dire than
I feared yesterday. In the first place, we disallow settings of
LC_COLLATE and LC_CTYPE that don't match the database encoding, see
tests in dbcommands.c's check_encoding_locale_matches() and in initdb.
So that core functionality will be consistent in any case.

Also, I see that PGLC_localeconv() is effectively doing exactly what
you suggested for strings that are encoded according to LC_MONETARY
and LC_NUMERIC:

encoding = pg_get_encoding_from_locale(locale_monetary, true);

db_encoding_convert(encoding, &worklconv.int_curr_symbol);
db_encoding_convert(encoding, &worklconv.currency_symbol);
...

This is a little bit off, now that I look at it, because it's
failing to account for the possibility of getting -1 from
pg_get_encoding_from_locale. It should probably do what
pg_bind_textdomain_codeset does:

if (encoding < 0)
encoding = PG_SQL_ASCII;

since passing PG_SQL_ASCII to the conversion will have the effect of
validating the data without any actual conversion.

I remain wary of this idea because it's depending on something that's
undefined per POSIX, but apparently it's working well enough for
LC_MONETARY and LC_NUMERIC, so we can probably get away with it for
LC_TIME as well. Anyway the current code clearly does not work on
glibc, and I also verified that there's a problem on FreeBSD, so
this patch should make things better.

Also, experimentation suggests that LC_MESSAGES actually does work
the way I thought this stuff works, ie, its implied codeset isn't
really used. (I think this only matters for strerror(), since we
force the issue for gettext, but glibc's strerror() is clearly not
paying attention to that.) Sigh, who needs consistency?

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrey Borodin 2019-04-20 16:44:35 Re: block-level incremental backup
Previous Message Fabien COELHO 2019-04-20 13:41:11 Re: Add missing operator <->(box, point)