Re: Unicode + LC_COLLATE

From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: johnsw(at)wardbrook(dot)com
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Unicode + LC_COLLATE
Date: 2004-04-22 13:39:05
Message-ID: 200404221539.05444.peter_e@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Am Donnerstag, 22. April 2004 13:17 schrieb John Sidney-Woollett:
> Does anyone know what the effect of --lc-collate=C --encoding=UNICODE will
> be for sorts (and indexes?) when a multibyte unicode character is
> encountered?

You get your strings sorted in binary order of the UTF-8 encoding, which is
probably not very interesting, but it's possible.

> Is it also true that if LC_COLLATE != 'C' that indexes cannot be used for
> LIKE comparisons (and is this also true for en_US.iso885915)?

No, see <http://www.postgresql.org/docs/7.4/static/indexes-opclass.html>.

> Our database is UNICODE with LC_COLLATE=en_US.iso885915. Does anyone know
> what the effect of someone storing a cyrillic/chinese or korean character
> is?

This setup will result in UTF-8 characters being sorted by the system thinking
they are actually ISO-8859-15 characters. So the result will be random at
best.

> (We are using JDBC with a webapp so all the unicode concerns are
> handled transparently, apparantly). When the data is extracted from the DB
> will it render correctly in the browser provided we send all responses
> encoded in UTF-8?

If your database is in UNICODE and you're using JDBC then you should be all
set as far as PostgreSQL is concerned. Of course, your HTML pages need to
declare the encoding correctly as well.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Eric Comeau 2004-04-22 13:42:12 Re: Replication
Previous Message Oleg Bartunov 2004-04-22 13:33:55 Re: [GENERAL] Restoring a Databases that features tserach2