Re: Collate order on Mac OS X, text with diacritics in UTF-8

From: Martin Flahault <martin(at)billjobs(dot)com>
To: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Collate order on Mac OS X, text with diacritics in UTF-8
Date: 2010-01-13 15:15:06
Message-ID: 2BAC69E9-7738-4F03-A149-83DC9F80729C@billjobs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


Here is an exemple :

postgres=# create database newbase;
CREATE DATABASE
postgres=# \c newbase;
psql (8.4.2)
You are now connected to database "newbase".
newbase=# create table t1 (contenu text);
CREATE TABLE
newbase=# insert into t1 values ('a'), ('e'), ('à'), ('é'), ('A'), ('E');
INSERT 0 6

newbase=# select * from t1 order by contenu;
contenu
---------
A
E
a
e
à
é
(6 rows)

newbase=# select * from t1 order by upper(contenu);
contenu
---------
a
A
e
E
à
é
(6 rows)

Here is the encoding informations :

newbase=# \encoding
UTF8
newbase=# show lc_collate;
lc_collate
------------
fr_FR
(1 row)

newbase=# show lc_ctype;
lc_ctype
----------
fr_FR
(1 row)

As with others DBMS (MySQL for example), diacritics should be ignored when determining the sort order. Here is the expected output:
a
à
A
e
é
E

It seems there is a problem with the collating order on BSD systems with diacritics using UTF8.
If you put this text :
a
A
à
é
e
E

in a UTF8 text file and use the "sort" command on it, you will have the same wrong output as with PostgreSQL :
A
E
a
e
à
é

Hope this will help,

Martin

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Greg Smith 2010-01-13 15:58:44 Re: postgresql 8.1 windows 2008 64 bit
Previous Message Vincenzo Romano 2010-01-13 15:15:04 Re: R: Re: Weird EXECUTE ... USING behaviour