| From: | Jeff Davis <pgsql(at)j-davis(dot)com> |
|---|---|
| To: | Peter Eisentraut <peter(at)eisentraut(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
| Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org |
| Subject: | Re: Remaining dependency on setlocale() |
| Date: | 2025-11-21 00:58:16 |
| Message-ID: | 8186b28a1a39e61a0d833a4c25a8909ebbbabd48.camel@j-davis.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Wed, 2025-11-12 at 19:59 +0100, Peter Eisentraut wrote:
> Many of these issues are pre-existing, but I just figured it has
> reached
> a point where we need to do something about it.
I tried to simplify things in this patch series, assuming that we have
some tolerance for small behavior changes.
0001: No behavior change here, same patch as before. Uncontroversial
simplification, so I plan to commit this soon.
0002: change fuzzystrmatch to use ASCII semantics. As far as I can
tell, this only affects the results of soundex(). Before the patch, in
en_US.iso885915, soundex('réd') was 'RÉ30', after the patch it's
'Ré30'. I'm not sure whether the current behavior is intentional or
not. Other functions (daitch_mokotoff, levenshtein, and metaphone) are
unaffected as far as I can tell.
0003+0005: change ltree to use case folding instead of tolower(). I
believe this is a bug fix, because the current code is inconsistent
between ltree_strncasecmp() and ltree_crc32_sz().
0006-0007: Remove char_tolower() API. This also removes the
optimization for single-byte encodings with the libc provider and a
non-C locale, but simplifies the code (the optimization is retained for
the C locale). It's possible to make the lazy-folding optimization work
for all locales without the char_tolower() API by doing something
simlar to what 0004 does for ltree. But to make this work efficiently
for Generic_Text_IC_like() would be a bit more complex: we'd need to
adjust MatchText() to be able to fold the arguments lazily, and perhaps
introduce some kind of casemapping iterator. That's already a pretty
complex function, so I'm hesitant to do that work unless the
optimization is important.
These patches don't get us quite to the point of eliminating the
LC_CTYPE dependency (there's still downcase_identifier() and
pg_strcasecmp() to worry about, and some assorted isxyz() calls to
examine), but they simplify things enough that the path forward will be
easier.
Regards,
Jeff Davis
| Attachment | Content-Type | Size |
|---|---|---|
| v8-0001-Avoid-global-LC_CTYPE-dependency-in-pg_locale_lib.patch | text/x-patch | 2.1 KB |
| v8-0002-fuzzystrmatch-use-pg_ascii_toupper.patch | text/x-patch | 3.1 KB |
| v8-0003-Add-define-for-UNICODE_CASEMAP_BUFSZ.patch | text/x-patch | 1.2 KB |
| v8-0004-Allow-pg_locale_t-APIs-to-work-when-ctype_is_c.patch | text/x-patch | 6.9 KB |
| v8-0005-Fix-inconsistency-between-ltree_strncasecmp-and-l.patch | text/x-patch | 3.4 KB |
| v8-0006-Inline-pg_ascii_tolower-and-pg_ascii_toupper.patch | text/x-patch | 2.4 KB |
| v8-0007-Remove-char_tolower-API.patch | text/x-patch | 9.3 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Chao Li | 2025-11-21 01:09:21 | Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) |
| Previous Message | Samuel Thibault | 2025-11-21 00:54:29 | Re: GNU/Hurd portability patches |