Re: new environment variable INITDB_LOCALE_PROVIDER

From: Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: new environment variable INITDB_LOCALE_PROVIDER
Date: 2025-10-11 05:53:59
Message-ID: 77D14CC3-27E7-4EAE-811C-4B58C8C112A5@gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Oct 11, 2025, at 10:06, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
>
> On Sat, 2025-10-11 at 08:30 +0800, Chao Li wrote:
>> * If we make that fail, I don’t think that would break existing
>> scripts. Because the default provider is libc and you are introducing
>> a new environment variable to set locale provider, thus a plain
>> initdb will not use builtin provider. Maybe provider can come from
>> PG_TEST_INITDB_EXTRA_OPTS, I'm ok for test environment to only only
>> issue warnings.
>
> I would like it to be possible to change the initdb default in the
> future to "builtin". See:
>
> https://www.postgresql.org/message-id/e4ac16908dad3eddd3ed73c4862591375a3f0539.camel@j-davis.com
>
> in that case, initdb should be able to succeed without other options.

Yes, if we decide to along with that path, then what I talked would no longer be valid.

>
>> * I am thinking loudly. Builtin provider is more performant but with
>> certain limitations. Some production users may want to try builtin
>> provider for better performance but not being aware of the
>> limitation. Their environment contains the actual LC_CTYPE/LC_COLLATE
>> they want to use, and they set the new environment variable with
>> “builtin” for provider. In this case, failing “initdb” would make the
>> user clearly realize the limitation of builtin provider. Otherwise,
>> if the user also ignores the warning messages, then the database
>> would be created with unexpected ctype, which would lead to loss
>> (time, data, etc.)
>
> What limitation and/or loss are you concerned about?
>

For limitation of builtin provide, I just meant it supports less LC_CTYPE/LC_COLLATE than the other two providers.

I wasn’t concerned about anything, I was just imaging if anything could get a negative impact.

> Unless I'm mistaken, LC_CTYPE has very little practical effect when the
> provider is builtin and the encoding is UTF-8.
>
> The main effect that I'm aware of is that system errors from the OS
> rely on LC_CTYPE for translation. Ordinary Postgres messages don't need
> LC_CTYPE, so most of NLS still works even with LC_CTYPE=C; it's just
> strerror() that depends on LC_CTYPE for the encoding.
>
> LC_CTYPE also affects full text search parsing, but I'm fixing that as
> part of another patch to use the database locale instead.
>
> I think contrib/fuzzystrmatch may be affected.
>
> Callers of pg_strcasecmp() could be affected, but it's mostly used to
> compare with ascii anyway.
>
> If you are aware of other areas, please let me know.
>

Thanks for the explanation. I think I am good now. The latest v3 patch looks good to me.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Chao Li 2025-10-11 05:57:58 Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options
Previous Message Tatsuo Ishii 2025-10-11 05:42:18 Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options