Re: new environment variable INITDB_LOCALE_PROVIDER

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: new environment variable INITDB_LOCALE_PROVIDER
Date: 2025-10-11 02:06:02
Message-ID: 13e3d042637c3a2c821d380924a79da045c99f5f.camel@j-davis.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, 2025-10-11 at 08:30 +0800, Chao Li wrote:
> * If we make that fail, I don’t think that would break existing
> scripts. Because the default provider is libc and you are introducing
> a new environment variable to set locale provider, thus a plain
> initdb will not use builtin provider. Maybe provider can come from
> PG_TEST_INITDB_EXTRA_OPTS, I'm ok for test environment to only only
> issue warnings.

I would like it to be possible to change the initdb default in the
future to "builtin". See:

https://www.postgresql.org/message-id/e4ac16908dad3eddd3ed73c4862591375a3f0539.camel@j-davis.com

in that case, initdb should be able to succeed without other options.

> * I am thinking loudly. Builtin provider is more performant but with
> certain limitations. Some production users may want to try builtin
> provider for better performance but not being aware of the
> limitation. Their environment contains the actual LC_CTYPE/LC_COLLATE
> they want to use, and they set the new environment variable with
> “builtin” for provider. In this case, failing “initdb” would make the
> user clearly realize the limitation of builtin provider. Otherwise,
> if the user also ignores the warning messages, then the database
> would be created with unexpected ctype, which would lead to loss
> (time, data, etc.)

What limitation and/or loss are you concerned about?

Unless I'm mistaken, LC_CTYPE has very little practical effect when the
provider is builtin and the encoding is UTF-8.

The main effect that I'm aware of is that system errors from the OS
rely on LC_CTYPE for translation. Ordinary Postgres messages don't need
LC_CTYPE, so most of NLS still works even with LC_CTYPE=C; it's just
strerror() that depends on LC_CTYPE for the encoding.

LC_CTYPE also affects full text search parsing, but I'm fixing that as
part of another patch to use the database locale instead.

I think contrib/fuzzystrmatch may be affected.

Callers of pg_strcasecmp() could be affected, but it's mostly used to
compare with ascii anyway.

If you are aware of other areas, please let me know.

Regards,
Jeff Davis

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2025-10-11 02:22:16 Re: [PING] [PATCH v2] parallel pg_restore: avoid disk seeks when jumping short distance forward
Previous Message Jeff Davis 2025-10-11 00:48:10 Change initdb default to the builtin collation provider