Re: Collation and primary keys

From: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
To: Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Collation and primary keys
Date: 2025-07-17 06:30:45
Message-ID: dddea03680530adf4efb4efb30dad54af8aca269.camel@cybertec.at
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 2025-07-16 at 09:46 -0700, Jeff Davis wrote:
> On Wed, 2025-07-16 at 08:29 +0200, Laurenz Albe wrote:
> > I have a radical proposal: Rather than having "initdb" default to
> > whatever locale is in the environment, make it default the the
> > builtin provider and the C collation.  Wherever people need a natural
> > language collation, they can say so explicitly.
>
> You bring up a good sub-point, which is that there are actually three
> builtin locales[1]: C, C.UTF-8, and PG_UNICODE_FAST. All three have
> exactly the same sorting and equality semantics (memcmp()), and
> therefore any of them would solve the problems raised in this thread.
>
> > Not that I want to present Oracle as an example to follow in general,
> > but that's how they are doing it, and while I do hear complaints from
> > Oracle users, I have yet to hear a complaint about the default binary
> > collation.
>
> My understanding was that, while it does binary sort order, it still
> does Unicode-aware case mapping.
>
> If so, that would be closer to the C.UTF-8 locale (Unicode Simple Case
> Mapping) or the PG_UNICODE_FAST locale (Unicode Full Case Mapping,
> which includes multi-character mappings like 'ß' to 'SS').
>
> Note that the SQL standard seems to require Unicode Full Case Mapping.

I wasn't aware how Oracle handles case mapping, but it seems you
are right:

SQL> SELECT upper('ı'), upper('ä') FROM dual;

U UP
- --
I Ä

Perhaps then using one of the collations you mentioned would be the
best solution.

I'm still a little bit worried that changes in the case mapping might
break some indexes. We have a track record with going against the
standard when it comes to case conversion, so perhaps we wouldn't
spill too much milk if we only convert ASCII correctly.

But perhaps I am just being paranoid.

Yours,
Laurenz Albe

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2025-07-17 06:42:11 Re: Missing NULL check after calling ecpg_strdup
Previous Message Amit Kapila 2025-07-17 06:21:13 Re: 024_add_drop_pub.pl might fail due to deadlock