From: | Jeff Davis <pgsql(at)j-davis(dot)com> |
---|---|
To: | Peter Eisentraut <peter(at)eisentraut(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Remaining dependency on setlocale() |
Date: | 2025-07-08 00:56:03 |
Message-ID: | 6956823dd669abc64183fc91e64dc2e56e31f3ea.camel@j-davis.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, 2025-06-11 at 12:15 -0700, Jeff Davis wrote:
> > v1-0008-Set-process-LC_COLLATE-C-and-LC_CTYPE-C.patch
> >
> > As I mentioned earlier in the thread, I don't think we can do this
> > for
> > LC_CTYPE, because otherwise system error messages would not come
> > out
> > in
> > the right encoding.
>
> Changed it so that it only sets LC_COLLATE to C, and leaves LC_CTYPE
> set to datctype.
>
> Unfortunately, as long as LC_CTYPE is set to a real locale, there's a
> danger of accidentally depending on that setting. Can the encoding be
> controlled with LC_MESSAGES instead of LC_CTYPE?
>
> Do you have an example of how things can go wrong?
I looked into this a bit, and if I understand correctly, the only
problem is with strerror() and strerror_r(), which depend on
LC_MESSAGES for the language but LC_CTYPE to find the right encoding.
I attached some example C code to illustrate how strerror() is affected
by both LC_MESSAGES and LC_CTYPE. For example:
$ ./strerror de_DE.UTF-8 de_DE.UTF-8
LC_CTYPE set to: de_DE.UTF-8
LC_MESSAGES set to: de_DE.UTF-8
Error message (from strerror(EILSEQ)): Ungültiges oder
unvollständiges Multi-Byte- oder Wide-Zeichen
$ ./strerror C de_DE.UTF-8
LC_CTYPE set to: C
LC_MESSAGES set to: de_DE.UTF-8
Error message (from strerror(EILSEQ)): Ung?ltiges oder
unvollst?ndiges Multi-Byte- oder Wide-Zeichen
On unix-based systems, we can use newlocale() to initialize a global
variable with both LC_CTYPE and LC_MESSAGES set. The LC_MESSAGES
portion would need to be updated every time the GUC changes, which is
not great.
Windows would be a different story, though: strerror() doesn't seem to
have a variant that accepts a _locale_t object, and even if it did, I
don't see a way to create a _locale_t object with LC_MESSAGES and
LC_CTYPE set to different values. One idea is to use
_configthreadlocale(_ENABLE_PER_THREAD_LOCALE), and then use
setlocale(), which could enable us to use setlocale() similar to how we
use uselocale() on other systems. That would be awkward, though.
Thoughts? That seems like a lot of work just for the case of
strerror()/strerror_r().
Regards,
Jeff Davis
[1]
https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/configthreadlocale?view=msvc-170
Attachment | Content-Type | Size |
---|---|---|
strerror.c | text/x-csrc | 491 bytes |
From | Date | Subject | |
---|---|---|---|
Next Message | Andy Fan | 2025-07-08 01:01:52 | Re: A assert failure when initdb with track_commit_timestamp=on |
Previous Message | Tender Wang | 2025-07-08 00:52:13 | Re: MergeJoin beats HashJoin in the case of multiple hash clauses |