Quick Links

Re: Collation version tracking for macOS

From:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To:	Tobias Bussmann <t(dot)bussmann(at)gmx(dot)net>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
Subject:	Re: Collation version tracking for macOS
Date:	2022-06-10 05:56:35
Message-ID:	CA+hUKGLB5-OkBCO5JtGAoQU5wS-2v6w+quC+Sak00bfqOWJbcg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Fri, Jun 10, 2022 at 12:48 PM Tobias Bussmann <t(dot)bussmann(at)gmx(dot)net> wrote:
> Perhaps I can shed some light on this matter:

Hi Tobias,

Oh, thanks for your answers. Definitely a few bits of interesting
archeology I was not aware of.

> Apple's libc collations have always been a bit special in that concern, even for the non-UTF8 ones. Rooted in ancient FreeBSD they "try to keep collating table backward compatible with ASCII" thus upper and lower cases characters are separated (There are exceptions like 'cs_CZ.ISO8859-2').

Wow. I see that I can sort the English dictionary the way most people
expect by pretending it's Czech. What a mess!

> With your smoke test "sort /usr/share/dict/words" on a modern macOS you won't see a difference between "C" and "en_US.UTF-8" but with "( echo '5£'; echo '£5' ) | LC_COLLATE=en_US.UTF-8 sort" you can produce a difference against "( echo '5£'; echo '£5' ) | LC_COLLATE=C sort". Or test with "diff -q <(LC_COLLATE=C sort /usr/share/dict/words) <(LC_COLLATE=es_ES.UTF-8 sort /usr/share/dict/words)"

I see, so it does *something*, just not what anybody wants.

In response to

Re: Collation version tracking for macOS at 2022-06-10 00:48:33 from Tobias Bussmann

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Kyotaro Horiguchi	2022-06-10 06:25:44	Re: Using PQexecQuery in pipeline mode produces unexpected Close messages
Previous Message	David Rowley	2022-06-10 05:11:41	Re: Allow foreign keys to reference a superset of unique columns