Unicode + LC_COLLATE

From: "John Sidney-Woollett" <johnsw(at)wardbrook(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Unicode + LC_COLLATE
Date: 2004-04-22 11:17:17
Message-ID: 3282.192.168.0.64.1082632637.squirrel@mercury.wardbrook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Priem, Alexander said:
> I recreated my entire database (luckily I keep scripts for
> table/index/view
> creation) and initdb-ed it using --lc-collate=C --encoding=UNICODE. In my
> psqlODBC DSN settings I added "set client_encoding='LATIN9';" to the
> Connect Settings and that solved all my problems regarding the
> special characters.

Does anyone know what the effect of --lc-collate=C --encoding=UNICODE will
be for sorts (and indexes?) when a multibyte unicode character is
encountered?

Is --lc-collate=C --encoding=UNICODE even valid? And if it's valid what
unexpected nasties could it cause?

Is it also true that if LC_COLLATE != 'C' that indexes cannot be used for
LIKE comparisons (and is this also true for en_US.iso885915)?

Our database is UNICODE with LC_COLLATE=en_US.iso885915. Does anyone know
what the effect of someone storing a cyrillic/chinese or korean character
is? (We are using JDBC with a webapp so all the unicode concerns are
handled transparently, apparantly). When the data is extracted from the DB
will it render correctly in the browser provided we send all responses
encoded in UTF-8?

Although http://www.postgresql.org/docs/7.4/interactive/charset.html
describes Postgres specific implementation and "how to configure for" a
given locale - the subtle nuances of combinations of encoding and
LC_COLLATE, and the tradeoffs are not entirely clear (to me at least). For
example are the performance penalties of using UNICODE over ASCII
significant?

Maybe it's just my inexperience but this topic seems to cause lots of
questions. A good/simple technote would be really useful... I'd do one but
I really don't know my ass from my elbow around this topic (and probably
many others too!).

Thanks for any answers/feedback/more info.

John Sidney-Woollett

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Karsten Hilbert 2004-04-22 11:58:14 Re: ident authentication problem
Previous Message Jim Seymour 2004-04-22 11:16:32 Re: [OT] Tom's/Marc's spam filters?