| From: | Jeff Davis <pgsql(at)j-davis(dot)com> |
|---|---|
| To: | Peter Eisentraut <peter(at)eisentraut(dot)org>, Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> |
| Cc: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org |
| Subject: | Re: Remaining dependency on setlocale() |
| Date: | 2025-12-12 20:11:40 |
| Message-ID: | 0e186a9a92634f0c5675a618ff5685d00cd8f836.camel@j-davis.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Fri, 2025-12-05 at 16:01 +0100, Peter Eisentraut wrote:
> v11-0003-Fix-inconsistency-between-ltree_strncasecmp-and-.patch
>
> The function comment reads "Check if b has a prefix of a." -- Is that
> the same as "Check if a is a prefix of b."? The latter might be
> clearer.
Yes, fixed.
Note: I separated this into two patches. 0003 fixes the multibyte
mishandling issue, and 0004 consistently performs case folding. 0003 is
backpatchable, I believe.
> but the patch removes SB_lower_char().
Fixed and committed.
> v11-0006-Use-multibyte-aware-extraction-of-pattern-prefix.patch
>
> Might have a small typo in the commit message:
>
> ; and preserve and char-at-a-time logic for bytea.
Fixed.
I also changed it into two functions: like_fixed_prefix(), which is
almost unchanged from the original; and like_fixed_prefix_ci(), which
is multibyte and locale-aware. It was too confusing to have single-byte
and multi-byte logic in the same function, and they didn't share much
code anyway.
> case '\xc7': /* C with cedilla */
>
> so the premise that "fuzzystrmatch is designed for ASCII" does not
> appear to be correct. Needs more analysis.
>
> (But apparently it's not multibyte aware at all, so I don't know what
> to
> do about that.)
I didn't notice that, thank you. Agreed, we need a bit more discussion
around this case as well as soundex().
> v11-0008-downcase_identifier-use-method-table-from-locale.patch
>
> I'm confused here about the name of the function pg_strfold_ident().
> In
> general, case "folding" results in an opaque string that is really
> only
> useful for comparing against other case-folded strings. But for
> identifiers we are actually interested lower-casing. I think this
> should be corrected in the API naming.
Agreed and fixed.
Also, I added 0006, which saves a locale_t object for ICU in this one
case where it's required. Surely that's not what we want in the long
term, but we don't have the infrastructure for decoding pg_wchar into
code points yet, and 0006 avoids the dependency on the global LC_CTYPE
setting.
> v11-0009-Control-LC_COLLATE-with-GUC.patch
>
> I know there were some complaints about compatibility with
> extensions,
> but I don't think anything concrete was presented. I would like to
> see
> more evidence that we need this.
>
> Also, recall that we used to have a lc_collate GUC, and in the end
> people got confused that it didn't actually show a meaningful value
> when
> you used ICU. So we removed that. It seems adding this back in
> would
> create a similar kind of confusion. So to avoid that, maybe this
> should
> be called fallback_lc_collate or something like that.
Yes, this is a POC patch and needs more discussion.
What are your thoughts about a similar lc_ctype GUC, though? That has
slightly different trade-offs.
I believe v12 0001-0005 are about ready for commit, and 0003 should be
backported.
Regards,
Jeff Davis
| Attachment | Content-Type | Size |
|---|---|---|
| v12-0001-Use-multibyte-aware-extraction-of-pattern-prefix.patch | text/x-patch | 8.1 KB |
| v12-0002-Remove-unused-single-byte-char_is_cased-API.patch | text/x-patch | 5.4 KB |
| v12-0003-Fix-multibyte-issue-in-ltree_strncasecmp.patch | text/x-patch | 5.7 KB |
| v12-0004-Fix-inconsistency-between-ltree_strncasecmp-and-.patch | text/x-patch | 3.8 KB |
| v12-0005-downcase_identifier-use-method-table-from-locale.patch | text/x-patch | 10.4 KB |
| v12-0006-Avoid-global-LC_CTYPE-dependency-in-pg_locale_ic.patch | text/x-patch | 3.9 KB |
| v12-0007-fuzzystrmatch-use-pg_ascii_toupper.patch | text/x-patch | 5.0 KB |
| v12-0008-Control-LC_COLLATE-with-GUC.patch | text/x-patch | 7.2 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Mark Wong | 2025-12-12 20:42:20 | Re: Speed up COPY FROM text/CSV parsing using SIMD |
| Previous Message | Tom Lane | 2025-12-12 20:01:12 | Re: [PATCH] Fix severe performance regression with gettext 0.20+ on Windows |