Re: Remaining dependency on setlocale()

From: Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Peter Eisentraut <peter(at)eisentraut(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Remaining dependency on setlocale()
Date: 2025-11-27 01:08:16
Message-ID: 6DDB0693-29E0-41B6-AE63-F0723F7DA8CD@gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Nov 26, 2025, at 09:50, Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> wrote:
>
> I will review the rest 3 commits tomorrow.

10 - 0009
```
{
if (isalpha((unsigned char) c))
{
- c = toupper((unsigned char) c);
+ c = pg_ascii_toupper((unsigned char) c);
```

Just curious. As isaplha() and toupper() come from the same header file ctype.h, if we replace toupper with pg_ascii_toupper, does isapha also need to be handled?

11 - 0010
```
- for (i = 0; i < len; i++)
- {
- unsigned char ch = (unsigned char) ident[i];
+ dstsize = len + 1;
+ result = palloc(dstsize);

- if (ch >= 'A' && ch <= 'Z')
- ch += 'a' - 'A';
- else if (enc_is_single_byte && IS_HIGHBIT_SET(ch) && isupper(ch))
- ch = tolower(ch);
- result[i] = (char) ch;
- }
- result[i] = '\0';
+ needed = pg_strfold_ident(result, dstsize, ident, len);
+ Assert(needed + 1 == dstsize);
+ Assert(needed == len);
```

I think assert both dstsize and len are redundant, because dstsize=len+1, and no place to change their values.

12 - 0010
```
+/*
+ * Fold an identifier using the database default locale.
+ *
+ * For historical reasons, does not use ordinary locale behavior. Should only
+ * be used for identifier folding. XXX: can we make this equivalent to
+ * pg_strfold(..., default_locale)?
+ */
+size_t
+pg_strfold_ident(char *dest, size_t destsize, const char *src, ssize_t srclen)
+{
+ if (default_locale == NULL || default_locale->ctype == NULL)
+ {
+ int i;
+
+ for (i = 0; i < srclen && i < destsize; i++)
+ {
+ unsigned char ch = (unsigned char) src[i];
+
+ if (ch >= 'A' && ch <= 'Z')
+ ch += 'a' - 'A';
+ dest[i] = (char) ch;
+ }
+
+ if (i < destsize)
+ dest[i] = '\0';
+
+ return srclen;
+ }
+ return default_locale->ctype->strfold_ident(dest, destsize, src, srclen,
+ default_locale);
+}
```

Given default_local can be NULL only at some specific moment, can we do something like

Local = default_local;
If (local == NULL || local->ctype == NULL)
Local = libc or other fallback;
Return default_locale->ctype->strfold_ident(dest, destsize, src, srclen, local);

This way avoids the duplicate code.

13 - 0011
```
+{ name => 'lc_collate', type => 'string', context => 'PGC_SUSET', group => 'CLIENT_CONN_LOCALE',
+ short_desc => 'Sets the locale for text ordering in extensions.',
```

I just feel the GUC name is very misleading. Without carefully reading the doc, users may very easy to consider lc_collate the system’s locale. If it only affects extensions, then let’s name it accordingly, for example, “extension_lc_collate”, or “legacy_lc_collate”.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuo Ishii 2025-11-27 01:16:42 Re: Row pattern recognition
Previous Message David Rowley 2025-11-27 01:07:03 Re: Support tid range scan in parallel?