Re: collation & UTF-8

From: Tomi NA <hefest(at)gmail(dot)com>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>, Tomi NA <hefest(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: collation & UTF-8
Date: 2006-02-24 17:45:40
Message-ID: d487eb8e0602240945g723a9900l84b01e9abcedd493@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 2/24/06, Martijn van Oosterhout <kleptog(at)svana(dot)org> wrote:
>
> On Fri, Feb 24, 2006 at 06:23:07PM +0100, Tomi NA wrote:
> > I'm using PosgreSQL 8.1.2 on linux and want to load UTF-8 encoded
> varchars.
> > While I can store and get at stored text correctly, the ORDER BY places
> all
> > accented characters (Croatian, in this case - probably marked hr_HR)
> after
> > non-accented characters.
> > This is no showstopper, but it does affect the general perception of
> > application quality.
>
> Collation is a function of the OS. Basically, is the locale of your
> database setup for UTF-8 collation? It would probably be called
> hr_HR.UTF-8.

You were right about this:
LC_ALL=hr_HR.UTF-8 sort < test.txt
(seemingly) collates the same way that pgsql does. Accented letters at the
end of the alphabet. I've tried hr_HR.UTF8 as well, without results.
Btw, my database is created with
CREATE DATABASE mydb
WITH OWNER = postgres
ENCODING = 'UTF8'
TABLESPACE = pg_default;

Yes, setup the locale correctly. In general, postgresql should give the
> same results as sort(1) on the command-line. Use that to experiment.
>
> LC_ALL=hr_HR.UTF-8 sort < input > output

I'm very sorry to report it does not work. :(
Btw,
set | grep LC_
returns nothing...is this a possible source of the problem?

Tomislav

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2006-02-24 17:52:18 Re: ltree + gist index performance degrades significantly over a night
Previous Message CG 2006-02-24 17:44:37 Re: ltree + gist index performance degrades significantly over a night