Skip site navigation (1) Skip section navigation (2)

ORDER BY and Unicode

From: "M(dot) Bastin" <marcbastin(at)mindspring(dot)com>
To: pgsql-novice(at)postgresql(dot)org
Subject: ORDER BY and Unicode
Date: 2004-05-12 12:39:58
Message-ID: a06100515bcc7c7cdeb35@[192.168.0.101] (view raw or whole thread)
Thread:
Lists: pgsql-novice
There seems to be a big problem with Unicode for 
which a solution might already exist.  Somebody 
had the following problem on another mailing 
list.  My suggestion is at the bottom of this 
message but if another solution already exists 
I'd like to hear about it.

The problem is that special characters aren't 
treated right under Unicode.  Here are a few 
examples:

1.   "UPPER('é')" doesn't work.
(That's an accented "e" in there if it doesn't 
come through in your e-mail application)

The implication of this is that

     SELECT ... WHERE UPPER(mycolumn) LIKE UPPER('my search string')

doesn't give the functionality you'd want.
UPPER and LOWER seem to work on ASCII only.  The 
Greek, French, etc. are out of luck.


2.  "ORDER BY mycolumn" gives a wrong sort order.

Uppercase ASCII characters come first, then 
lowercase ASCII, then accented characters... 
This really isn't what a human would like to see.


I think the two examples above illustrate this 
Unicode problem quite well.  Is there an existing 
solution?  If not could we work together on 
creating one, as suggested at the very bottom of 
this message?

Thanks,

Marc

------------------------------------------------------

You can use the translate function to solve your problem.
<http://www.postgresql.org/docs/7.4/interactive/functions-string.html>

e.g. for the letter "a":

    SELECT * FROM mytable ORDER BY 
translate(textcolumn, 'àáâäÀÁÂÄa', 'AAAAAAAAA');

Then you build an index like this to speed things up:

   CREATE INDEX MyIndex ON MyTable 
(translate(textcolumn, 'àáâäÀÁÂÄa', 'AAAAAAAAA'));

And your select queries will also be case and 
accent independent from then on e.g:

    SELECT * FROM mytable WHERE 
translate(textcolumn, 'àáâäÀÁÂÄa', 'AAAAAAAAA') 
LIKE translate('my search string', 'àáâäÀÁÂÄa', 
'AAAAAAAAA');


Since the alphabet is very long you're better off 
creating your own function that does this 
translation for the whole alphabet and then you'd 
get something like this:

    SELECT * FROM mytable WHERE 
MySimpleABC(textcolumn) LIKE MySimpleABC('my 
search string') ORDER BY MySimpleABC(textcolumn);

Your index would be like this:

   CREATE INDEX MyIndex ON MyTable (MySimpleABC(textcolumn));

Cheers,

Marc

PS:  Maybe we should work together to create the 
mother of all functions that would do this for 
all of Unicode?  Anybody else up to this?

In response to

Responses

pgsql-novice by date

Next:From: Stephan SzaboDate: 2004-05-12 13:51:08
Subject: Re: ORDER BY and Unicode
Previous:From: Sandro MartinezDate: 2004-05-12 07:07:12
Subject: ayuda

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group