Re: How well does PostgreSQL 9.6.1 support unicode?

From: Steve Rogerson <steve(dot)pg(at)yewtc(dot)demon(dot)co(dot)uk>
To: James Zhou <james(at)360data(dot)ca>, <pgsql-general(at)postgresql(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: How well does PostgreSQL 9.6.1 support unicode?
Date: 2016-12-21 12:37:34
Message-ID: 10a9b3be-f813-085c-d6a7-285f6ae3f82b@yewtc.demon.co.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 21/12/16 05:24, Tom Lane wrote:
> James Zhou <james(at)360data(dot)ca> writes:
>> - *But their sorting order seems to be undefined. Can anyone comment
>> the sorting rules?*
>
> Well, it would depend on lc_collate, which you have not told us, and
> it would also depend on how well your platform's strcoll() function
> implements that collation; but you have not told us what platform this
> is running on.

As I understand it, when you first initialise pg with initdb, it inherits the
collation of the process that runs the initdb.
Having said that see:

https://www.postgresql.org/docs/9.6/static/collation.html

"If the operating system provides support for using multiple locales within a
single program (newlocale and related functions), then when a database cluster
is initialized, initdb populates the system catalog pg_collation with
collations based on all the locales it finds on the operating system at the time."

So the pg is capable, in principle at least, of using any of the locales
available at the time that initdb is run.

>
> Most of the other behaviors you mention are also partly or wholly
> dependent on which software you use with Postgres and whether you've
> correctly configured that software. So it's pretty hard to answer
> this usefully with only this much info.
>

The more recent versions of perl (see http://perldoc.perl.org/perlunicode.htm
- maybe other languages) knows, not only about code points, but also
"graphemes", so in the appropriate context "LATIN CAPITAL LETTER E WITH ACUTE"
can be considered to be "equal" to "LATIN CAPITAL LETTER E" together with
"COMBINING ACUTE ACCENT", although they are 1 and 2 unicode characters
respectively so this effects notions of equality as well as collation. This
has implications for pg varchar(N) fields etc.

I would be interest to know what support pg has/will have for graphemes.

Steve

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Vick Khera 2016-12-21 13:37:48 Re: Fwd: Request to share approach during REINDEX operation
Previous Message Yogesh Sharma 2016-12-21 12:28:01 Re: