Re: Remaining dependency on setlocale()

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Peter Eisentraut <peter(at)eisentraut(dot)org>, Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Remaining dependency on setlocale()
Date: 2025-12-12 20:11:40
Message-ID: 0e186a9a92634f0c5675a618ff5685d00cd8f836.camel@j-davis.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 2025-12-05 at 16:01 +0100, Peter Eisentraut wrote:
> v11-0003-Fix-inconsistency-between-ltree_strncasecmp-and-.patch
>
> The function comment reads "Check if b has a prefix of a." -- Is that
> the same as "Check if a is a prefix of b."?  The latter might be
> clearer.

Yes, fixed.

Note: I separated this into two patches. 0003 fixes the multibyte
mishandling issue, and 0004 consistently performs case folding. 0003 is
backpatchable, I believe.

> but the patch removes SB_lower_char().

Fixed and committed.

> v11-0006-Use-multibyte-aware-extraction-of-pattern-prefix.patch
>
> Might have a small typo in the commit message:
>
> ; and preserve and char-at-a-time logic for bytea.

Fixed.

I also changed it into two functions: like_fixed_prefix(), which is
almost unchanged from the original; and like_fixed_prefix_ci(), which
is multibyte and locale-aware. It was too confusing to have single-byte
and multi-byte logic in the same function, and they didn't share much
code anyway.

> case '\xc7':        /* C with cedilla */
>
> so the premise that "fuzzystrmatch is designed for ASCII" does not
> appear to be correct.  Needs more analysis.
>
> (But apparently it's not multibyte aware at all, so I don't know what
> to
> do about that.)

I didn't notice that, thank you. Agreed, we need a bit more discussion
around this case as well as soundex().

> v11-0008-downcase_identifier-use-method-table-from-locale.patch
>
> I'm confused here about the name of the function pg_strfold_ident(). 
> In
> general, case "folding" results in an opaque string that is really
> only
> useful for comparing against other case-folded strings.  But for
> identifiers we are actually interested lower-casing.  I think this
> should be corrected in the API naming.

Agreed and fixed.

Also, I added 0006, which saves a locale_t object for ICU in this one
case where it's required. Surely that's not what we want in the long
term, but we don't have the infrastructure for decoding pg_wchar into
code points yet, and 0006 avoids the dependency on the global LC_CTYPE
setting.

> v11-0009-Control-LC_COLLATE-with-GUC.patch
>
> I know there were some complaints about compatibility with
> extensions,
> but I don't think anything concrete was presented.  I would like to
> see
> more evidence that we need this.
>
> Also, recall that we used to have a lc_collate GUC, and in the end
> people got confused that it didn't actually show a meaningful value
> when
> you used ICU.  So we removed that.  It seems adding this back in
> would
> create a similar kind of confusion.  So to avoid that, maybe this
> should
> be called fallback_lc_collate or something like that.

Yes, this is a POC patch and needs more discussion.

What are your thoughts about a similar lc_ctype GUC, though? That has
slightly different trade-offs.

I believe v12 0001-0005 are about ready for commit, and 0003 should be
backported.

Regards,
Jeff Davis

Attachment Content-Type Size
v12-0001-Use-multibyte-aware-extraction-of-pattern-prefix.patch text/x-patch 8.1 KB
v12-0002-Remove-unused-single-byte-char_is_cased-API.patch text/x-patch 5.4 KB
v12-0003-Fix-multibyte-issue-in-ltree_strncasecmp.patch text/x-patch 5.7 KB
v12-0004-Fix-inconsistency-between-ltree_strncasecmp-and-.patch text/x-patch 3.8 KB
v12-0005-downcase_identifier-use-method-table-from-locale.patch text/x-patch 10.4 KB
v12-0006-Avoid-global-LC_CTYPE-dependency-in-pg_locale_ic.patch text/x-patch 3.9 KB
v12-0007-fuzzystrmatch-use-pg_ascii_toupper.patch text/x-patch 5.0 KB
v12-0008-Control-LC_COLLATE-with-GUC.patch text/x-patch 7.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Wong 2025-12-12 20:42:20 Re: Speed up COPY FROM text/CSV parsing using SIMD
Previous Message Tom Lane 2025-12-12 20:01:12 Re: [PATCH] Fix severe performance regression with gettext 0.20+ on Windows