Problem while ordering Turkish chars

From: Devrim GUNDUZ <devrim(at)gunduz(dot)org>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Problem while ordering Turkish chars
Date: 2003-04-19 11:04:21
Message-ID: Pine.LNX.4.44.0304191307230.12333-100000@emo.org.tr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Hi,

I'm experiencing some problems while ordering Turkish characters.

Here is that I mean:

Let's say we have two records:

Onder <- There are two dots on O. (ASCII CODE: 153)
Ozan

In Turkish alphabet, O (dotless) comes before O (with dots). So, Ozan
should be listed before Onder, but it's listed after Onder; since
PostgreSQL thinks that those letters are the same...

To begin with:
operdb=# SELECT version();
version
---------------------------------------------------------------------------------------------------------
PostgreSQL 7.3.2 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.2
20020903 (Red Hat Linux 8.0 3.2-7)

(installed from rpm)

Here is an simple example: Lines 1, 10,11,12 have O with dots as the first
letter.
==========
SELECT oper_uzun_adi FROM operler ORDER BY oper_uzun_adi;
...

Ömer KORKMAZ
Oner BINBAS
Onur BAŞ
Onur TURHAN
Osman Selçuk SARIOĞLU
Osman TOPAN
Ozan GÜLEN
Ozan SARI
Ozge YAZICIOGLU
Özgür ÇAKICI
Özgür ÖZDEMİR
Özlem BAL
...
==========

But if I apply a select query and use LIKE, PostgreSQL gives me correct
solutions:

operdb=# SELECT oper_uzun_adi FROM operler WHERE oper_uzun_adi ILIKE 'O%';
oper_uzun_adi
------------------------------------------
Osman TOPAN
Oner BINBAS
Ozge YAZICIOGLU
Onur BAŞ
Onur TURHAN
Ozan GÜLEN
Osman Selçuk SARIOĞLU
Ozan SARI
(8 rows)

Now for O with dots:

operdb=# SELECT oper_uzun_adi FROM operler WHERE oper_uzun_adi ILIKE 'Ö%';
oper_uzun_adi
------------------------------------------
Özgür ÇAKICI
Özlem BAL
Ömer KORKMAZ
Özgür ÖZDEMİR

So, LIKE understands that they are different chars; but if I simply order
them; I do not get the correct result.

Here is the locale setting:

# Locale settings
#
# (initialized by initdb -- may be changed)
LC_MESSAGES = 'en_US'
LC_MONETARY = 'en_US'
LC_NUMERIC = 'en_US'
LC_TIME = 'en_US'

If I change it to tr_TR, nothing changes.

Could anyone help me?

Best regards,
--
Devrim GUNDUZ
devrim(at)gunduz(dot)org devrim(dot)gunduz(at)linux(dot)org(dot)tr
http://www.gunduz.org

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message mlw 2003-04-19 13:32:19 Re: Note about upcoming instability in FE/BE protocol
Previous Message Christopher Kings-Lynne 2003-04-19 09:22:49 Re: For the ametures. (related to "Are we losing momentum?")