Re: Collation and primary keys

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Daniel Verite <daniel(at)manitou-mail(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Collation and primary keys
Date: 2025-07-24 00:23:33
Message-ID: 9b259f4c532943e428e9665122f37c099bab250e.camel@j-davis.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 2025-07-23 at 13:53 +0200, Daniel Verite wrote:
> > * The libc C.UTF-8 locale was a reasonable default (though not a
> > natural language collation). But now that we have C.UTF-8 available
> > from the builtin provider, then we should encourage that instead of
> > relying on the slower, platform-specific libc implementation.
>
> Yes. In particular, we should encourage the ecosystem to support
> the new collation features so that they're widely available to
> end users.

Then I propose that we change the initdb default to builtin C.UTF-8.
Patch attached.

To get the old initdb behavior use --locale-provider=libc, and all the
other defaults will work as before.

The change would not disrupt upgrades (see commit 9637badd9f).

One annoyance: if your environment has an LC_CTYPE with a non-UTF-8
locale, then initdb forces LC_CTYPE=C and emits a warning.

I had previously tried, and failed, to change the default to ICU for
v16, so it's worth mentioning why I don't believe this proposal will
run into the same problems:

* ICU, while better than libc, didn't completely solve any of the
problems. This proposal completely solves the inconsistent primary key
problem, and is much faster than libc or ICU.

* In the version 16 change, we were still attempting to map environment
variables to ICU locales, which was never going to work very well. In
particular, as you pointed out, ICU has nothing to approximate the
C.UTF-8 locale. The current proposal doesn't attempt that kind of
cleverness.

Comments?

Regards,
Jeff Davis

Attachment Content-Type Size
0001-initdb-default-to-builtin-C.UTF-8.patch text/x-patch 7.2 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2025-07-24 00:27:31 Re: Fixing MSVC's inability to detect elog(ERROR) does not return
Previous Message Michael Paquier 2025-07-24 00:20:13 Re: Custom pgstat support performance regression for simple queries