From: | Jeff Davis <pgsql(at)j-davis(dot)com> |
---|---|
To: | Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: new environment variable INITDB_LOCALE_PROVIDER |
Date: | 2025-10-11 02:06:02 |
Message-ID: | 13e3d042637c3a2c821d380924a79da045c99f5f.camel@j-davis.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sat, 2025-10-11 at 08:30 +0800, Chao Li wrote:
> * If we make that fail, I don’t think that would break existing
> scripts. Because the default provider is libc and you are introducing
> a new environment variable to set locale provider, thus a plain
> initdb will not use builtin provider. Maybe provider can come from
> PG_TEST_INITDB_EXTRA_OPTS, I'm ok for test environment to only only
> issue warnings.
I would like it to be possible to change the initdb default in the
future to "builtin". See:
https://www.postgresql.org/message-id/e4ac16908dad3eddd3ed73c4862591375a3f0539.camel@j-davis.com
in that case, initdb should be able to succeed without other options.
> * I am thinking loudly. Builtin provider is more performant but with
> certain limitations. Some production users may want to try builtin
> provider for better performance but not being aware of the
> limitation. Their environment contains the actual LC_CTYPE/LC_COLLATE
> they want to use, and they set the new environment variable with
> “builtin” for provider. In this case, failing “initdb” would make the
> user clearly realize the limitation of builtin provider. Otherwise,
> if the user also ignores the warning messages, then the database
> would be created with unexpected ctype, which would lead to loss
> (time, data, etc.)
What limitation and/or loss are you concerned about?
Unless I'm mistaken, LC_CTYPE has very little practical effect when the
provider is builtin and the encoding is UTF-8.
The main effect that I'm aware of is that system errors from the OS
rely on LC_CTYPE for translation. Ordinary Postgres messages don't need
LC_CTYPE, so most of NLS still works even with LC_CTYPE=C; it's just
strerror() that depends on LC_CTYPE for the encoding.
LC_CTYPE also affects full text search parsing, but I'm fixing that as
part of another patch to use the database locale instead.
I think contrib/fuzzystrmatch may be affected.
Callers of pg_strcasecmp() could be affected, but it's mostly used to
compare with ascii anyway.
If you are aware of other areas, please let me know.
Regards,
Jeff Davis
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2025-10-11 02:22:16 | Re: [PING] [PATCH v2] parallel pg_restore: avoid disk seeks when jumping short distance forward |
Previous Message | Jeff Davis | 2025-10-11 00:48:10 | Change initdb default to the builtin collation provider |