Re: Add standard collation UNICODE

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Add standard collation UNICODE
Date: 2023-03-05 00:10:36
Message-ID: 3046556.1677975036@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Jeff Davis <pgsql(at)j-davis(dot)com> writes:
> On Sun, 2023-03-05 at 08:27 +1300, Thomas Munro wrote:
>> It's created for UTF-8 only, and UTF-8 sorts the same way as the
>> encoded code points, when interpreted as a sequence of unsigned char
>> by memcmp(), strcmp() etc.  Seems right?

> Right, makes sense.

> Though in principle, shouldn't someone using another encoding also be
> able to use ucs_basic? I'm not sure if that's a practical problem or
> not; I'm just curious. Does ICU provide a locale for sorting by code
> point?

ISTM we could trivially allow it in LATIN1 encoding as well;
strcmp would still have the effect of sorting by unicode code points.

Given the complete lack of field demand for making it work in
other encodings, I'm unexcited about spending more effort than that.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David G. Johnston 2023-03-05 00:13:13 Re: Request for comment on setting binary format output per session
Previous Message Tom Lane 2023-03-05 00:06:58 Re: Request for comment on setting binary format output per session