| From: | Jeff Davis <pgsql(at)j-davis(dot)com> |
|---|---|
| To: | Peter Eisentraut <peter(at)eisentraut(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
| Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org |
| Subject: | Re: Remaining dependency on setlocale() |
| Date: | 2025-11-24 23:57:43 |
| Message-ID: | 450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Thu, 2025-11-20 at 16:58 -0800, Jeff Davis wrote:
> On Wed, 2025-11-12 at 19:59 +0100, Peter Eisentraut wrote:
> > Many of these issues are pre-existing, but I just figured it has
> > reached
> > a point where we need to do something about it.
>
> I tried to simplify things in this patch series, assuming that we
> have
> some tolerance for small behavior changes.
>
> 0001: No behavior change here, same patch as before. Uncontroversial
> simplification, so I plan to commit this soon.
Committed.
New series attached, which I tried to put in an order that would be
reasonable for commit.
0001-0004: Pure refactoring patches. I intend to commit a couple of
these soon.
0005: No behavioral change, and not much change at all. Computes the
"max_chr" for regexes (a performance optimization for low codepoints)
more consistently and simply based on the encoding.
0006: fixes longstanding ltree bug due to inconsistency between the
database locale and the global LC_CTYPE setting when using a non-libc
provider. The end result is also cleaner: use the database locale
consistently, like tsearch. I don't intend to backport this, unless
someone thinks it should be, but it should come with a release note to
reindex ltree indexes if using a non-libc provider.
0007: remove the char_tolower() API completely. We'd lose a pattern
matching optimization for single-byte encodings with libc and a non-C
locale, but it's a significant simplification. We could go even further
and change this to use casefolding rather than lower(), but that seems
like a separate change.
0008: Multibyte-aware extraction of pattern prefixes. The previous code
gave up on any byte that it didn't understand, which made prefixes
unnecessarily short. This patch is also cleaner.
0009: Changes fuzzystrmatch to use pg_ascii_toupper(). Most functions
in the extension are unaffected, but soundex() can be affected, and I'm
not sure what exactly it's supposed to do with non-ASCII.
0010: For downcase_identifier(), use a new provider-specific
pg_strfold_ident() method. The ICU version of this method is a work-in-
progress, because right now it depends on libc. I suppose it should
decode to UTF-32, then go through u_tolower(), then re-encode -- but
can the re-encoding fail? In any case, it would be a behavior change
for identifier casefolding with ICU and a single-byte encoding, which
is probably OK but the risk is non-zero.
0011: POC patch to introduce lc_collate GUC. It would only affect
extensions, PLs, libraries, or other non-core code that happens to call
strcoll() or strxfrm(). This would address Daniel's complaint, but it's
more flexible. And by being a GUC, it's clear that we shouldn't depend
on it for any stored data. We can do something similar for LC_CTYPE
after we eliminate dependencies in core code.
Regards,
Jeff Davis
| Attachment | Content-Type | Size |
|---|---|---|
| v9-0001-Inline-pg_ascii_tolower-and-pg_ascii_toupper.patch | text/x-patch | 2.4 KB |
| v9-0002-Add-define-for-UNICODE_CASEMAP_BUFSZ.patch | text/x-patch | 1.2 KB |
| v9-0003-Change-some-callers-to-use-pg_ascii_toupper.patch | text/x-patch | 1.4 KB |
| v9-0004-Allow-pg_locale_t-APIs-to-work-when-ctype_is_c.patch | text/x-patch | 6.9 KB |
| v9-0005-Make-regex-max_chr-depend-on-encoding-not-provide.patch | text/x-patch | 3.0 KB |
| v9-0006-Fix-inconsistency-between-ltree_strncasecmp-and-l.patch | text/x-patch | 3.4 KB |
| v9-0007-Remove-char_tolower-API.patch | text/x-patch | 9.1 KB |
| v9-0008-Use-multibyte-aware-extraction-of-pattern-prefixe.patch | text/x-patch | 11.5 KB |
| v9-0009-fuzzystrmatch-use-pg_ascii_toupper.patch | text/x-patch | 3.1 KB |
| v9-0010-downcase_identifier-use-method-table-from-locale-.patch | text/x-patch | 11.3 KB |
| v9-0011-Control-LC_COLLATE-with-GUC.patch | text/x-patch | 7.1 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Thomas Munro | 2025-11-25 00:09:38 | Re: Buffer locking is special (hints, checksums, AIO writes) |
| Previous Message | Chao Li | 2025-11-24 23:38:23 | Re: backend/nodes cleanup: Move loop variables definitions into for statement |