Re: Remaining dependency on setlocale()

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Peter Eisentraut <peter(at)eisentraut(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Remaining dependency on setlocale()
Date: 2025-10-29 00:19:50
Message-ID: d9657a6e51aa20702447bb2386b32fea6218670f.camel@j-davis.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 2025-07-23 at 19:11 -0700, Jeff Davis wrote:
> On Fri, 2025-07-11 at 11:48 +1200, Thomas Munro wrote:
> > On Fri, Jul 11, 2025 at 6:22 AM Jeff Davis <pgsql(at)j-davis(dot)com>
> > wrote:
> > > I don't have a great windows development environment, and it
> > > appears CI
> > > and the buildfarm don't offer great coverage either. Can I ask
> > > for
> > > a
> > > volunteer to do the windows side of this work?
> >
> > Me neither but I'm willing to help with that, and have done lots of
> > closely related things through trial-by-CI...

Attached a new patch series, v6.

Rather than creating new global locale_t objects, this series (along
with a separate patch for NLS[1]) removes the dependency on the global
LC_CTYPE entirely. It's a bunch of small patches that replace direct
calls to tolower()/toupper() with calls into the provider.

An assumption of these patches is that, in the UTF-8 encoding, the
logic in pg_tolower()/pg_toupper() is equivalent to
pg_ascii_tolower()/pg_ascii_toupper().

Generally these preserve existing behavior, but there are a couple
differences:

* If using the builtin C locale (not C.UTF-8) along with a datctype
that's a non-C locale with single-byte encoding, it could affect the
results of downcase_identifier(), ltree, and fuzzystrmatch on
characters > 127. For ICU, I went to a bit of extra effort to preserve
the existing behavior here, because it's more likely to be used for
single-byte encodings.

* When using ICU or builtin C.UTF-8, along with a datctype of
"tr_TR.UTF-8", then it will affect ltree's and fuzzystrmatch's
treatment of i/I.

If these are a concern we can fix them with some hacks, but those
behaviors seem fairly obscure to me.

Regards,
Jeff Davis

[1]
https://www.postgresql.org/message-id/90f176c5b85b9da26a3265b2630ece3552068566.camel@j-davis.com

Attachment Content-Type Size
v6-0001-Avoid-global-LC_CTYPE-dependency-in-pg_locale_lib.patch text/x-patch 2.1 KB
v6-0002-Define-char_tolower-char_toupper-for-all-locale-p.patch text/x-patch 8.1 KB
v6-0003-Avoid-global-LC_CTYPE-dependency-in-like.c.patch text/x-patch 931 bytes
v6-0004-Avoid-global-LC_CTYPE-dependency-in-scansup.c.patch text/x-patch 3.0 KB
v6-0005-Avoid-global-LC_CTYPE-dependency-in-pg_locale_icu.patch text/x-patch 3.9 KB
v6-0006-Avoid-global-LC_CTYPE-dependency-in-ltree-crc32.c.patch text/x-patch 716 bytes
v6-0007-Avoid-global-LC_CTYPE-dependency-in-fuzzystrmatch.patch text/x-patch 3.8 KB
v6-0008-Don-t-include-ICU-headers-in-pg_locale.h.patch text/x-patch 3.4 KB
v6-0009-Avoid-global-LC_CTYPE-dependency-in-strcasecmp.c-.patch text/x-patch 2.5 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2025-10-29 00:24:27 Re: Channel binding for post-quantum cryptography
Previous Message Tomas Vondra 2025-10-29 00:05:00 Re: PG18 GIN parallel index build crash - invalid memory alloc request size