| From: | Jeff Davis <pgsql(at)j-davis(dot)com> |
|---|---|
| To: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
| Cc: | Peter Eisentraut <peter(at)eisentraut(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org |
| Subject: | Re: Remaining dependency on setlocale() |
| Date: | 2025-10-29 00:19:50 |
| Message-ID: | d9657a6e51aa20702447bb2386b32fea6218670f.camel@j-davis.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Wed, 2025-07-23 at 19:11 -0700, Jeff Davis wrote:
> On Fri, 2025-07-11 at 11:48 +1200, Thomas Munro wrote:
> > On Fri, Jul 11, 2025 at 6:22 AM Jeff Davis <pgsql(at)j-davis(dot)com>
> > wrote:
> > > I don't have a great windows development environment, and it
> > > appears CI
> > > and the buildfarm don't offer great coverage either. Can I ask
> > > for
> > > a
> > > volunteer to do the windows side of this work?
> >
> > Me neither but I'm willing to help with that, and have done lots of
> > closely related things through trial-by-CI...
Attached a new patch series, v6.
Rather than creating new global locale_t objects, this series (along
with a separate patch for NLS[1]) removes the dependency on the global
LC_CTYPE entirely. It's a bunch of small patches that replace direct
calls to tolower()/toupper() with calls into the provider.
An assumption of these patches is that, in the UTF-8 encoding, the
logic in pg_tolower()/pg_toupper() is equivalent to
pg_ascii_tolower()/pg_ascii_toupper().
Generally these preserve existing behavior, but there are a couple
differences:
* If using the builtin C locale (not C.UTF-8) along with a datctype
that's a non-C locale with single-byte encoding, it could affect the
results of downcase_identifier(), ltree, and fuzzystrmatch on
characters > 127. For ICU, I went to a bit of extra effort to preserve
the existing behavior here, because it's more likely to be used for
single-byte encodings.
* When using ICU or builtin C.UTF-8, along with a datctype of
"tr_TR.UTF-8", then it will affect ltree's and fuzzystrmatch's
treatment of i/I.
If these are a concern we can fix them with some hacks, but those
behaviors seem fairly obscure to me.
Regards,
Jeff Davis
[1]
https://www.postgresql.org/message-id/90f176c5b85b9da26a3265b2630ece3552068566.camel@j-davis.com
| Attachment | Content-Type | Size |
|---|---|---|
| v6-0001-Avoid-global-LC_CTYPE-dependency-in-pg_locale_lib.patch | text/x-patch | 2.1 KB |
| v6-0002-Define-char_tolower-char_toupper-for-all-locale-p.patch | text/x-patch | 8.1 KB |
| v6-0003-Avoid-global-LC_CTYPE-dependency-in-like.c.patch | text/x-patch | 931 bytes |
| v6-0004-Avoid-global-LC_CTYPE-dependency-in-scansup.c.patch | text/x-patch | 3.0 KB |
| v6-0005-Avoid-global-LC_CTYPE-dependency-in-pg_locale_icu.patch | text/x-patch | 3.9 KB |
| v6-0006-Avoid-global-LC_CTYPE-dependency-in-ltree-crc32.c.patch | text/x-patch | 716 bytes |
| v6-0007-Avoid-global-LC_CTYPE-dependency-in-fuzzystrmatch.patch | text/x-patch | 3.8 KB |
| v6-0008-Don-t-include-ICU-headers-in-pg_locale.h.patch | text/x-patch | 3.4 KB |
| v6-0009-Avoid-global-LC_CTYPE-dependency-in-strcasecmp.c-.patch | text/x-patch | 2.5 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Michael Paquier | 2025-10-29 00:24:27 | Re: Channel binding for post-quantum cryptography |
| Previous Message | Tomas Vondra | 2025-10-29 00:05:00 | Re: PG18 GIN parallel index build crash - invalid memory alloc request size |