Quick Links

Re: Add standard collation UNICODE

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Jeff Davis <pgsql(at)j-davis(dot)com>
Cc:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Add standard collation UNICODE
Date:	2023-03-05 00:10:36
Message-ID:	3046556.1677975036@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Jeff Davis <pgsql(at)j-davis(dot)com> writes:
> On Sun, 2023-03-05 at 08:27 +1300, Thomas Munro wrote:
>> It's created for UTF-8 only, and UTF-8 sorts the same way as the
>> encoded code points, when interpreted as a sequence of unsigned char
>> by memcmp(), strcmp() etc. Seems right?

> Right, makes sense.

> Though in principle, shouldn't someone using another encoding also be
> able to use ucs_basic? I'm not sure if that's a practical problem or
> not; I'm just curious. Does ICU provide a locale for sorting by code
> point?

ISTM we could trivially allow it in LATIN1 encoding as well;
strcmp would still have the effect of sorting by unicode code points.

Given the complete lack of field demand for making it work in
other encodings, I'm unexcited about spending more effort than that.

regards, tom lane

In response to

Re: Add standard collation UNICODE at 2023-03-04 23:56:48 from Jeff Davis

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	David G. Johnston	2023-03-05 00:13:13	Re: Request for comment on setting binary format output per session
Previous Message	Tom Lane	2023-03-05 00:06:58	Re: Request for comment on setting binary format output per session