Re: Unicode database on non-unicode operating system

From: "Morten Barklund" <morten(dot)barklund(at)tbwa(dot)dk>
To: <pgsql-general(at)postgresql(dot)org>
Cc: "Peter Eisentraut" <peter_e(at)gmx(dot)net>
Subject: Re: Unicode database on non-unicode operating system
Date: 2008-07-15 14:19:23
Message-ID: AB6A9C75F1620048B14C9E7D9526F5B136CB92@TBWAMAIL.tbwa.dk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi Peter,

Thank you once again. That cleared up a lot of confusion for me and my
co-workers and the next server set up will be with unicode and en_DK.utf8
to ensure consistency.

Regards,
Morten Barklund

-----Original Message-----
From: Peter Eisentraut [mailto:peter_e(at)gmx(dot)net]
Sent: Tuesday, July 15, 2008 3:50 PM
To: pgsql-general(at)postgresql(dot)org
Cc: Morten Barklund
Subject: Re: [GENERAL] Unicode database on non-unicode operating system

Am Dienstag, 15. Juli 2008 schrieb Morten Barklund:
> I can see that lc_collate (sorting) and lc_ctype (lower-upper conversion)
> is set to en_DK and I guess that default encoding for en_DK is iso88591 or
> maybe windows1252.

It is ISO-8859-1. There is no support for Windows charmaps on Linux.

> Thus my server should have been initialized with
> en_DK.utf8 or?

Yes, or you should have chosen a different encoding (LATIN1 in your case) when
creating the database.

> How do I find out what the default encoding for the locale en_DK is?

$ LC_ALL=en_DK locale charmap
ISO-8859-1

Note that this is not the "default" encoding, it is the *only* encoding
supported by that locale.

> I can see, that normally one would sub-specify this by either
> adding .iso88591 or .utf8, but is windows1252 then default?

It might be reasonable to use the .iso88591 or .utf8 suffixes if you want to
be explicit, but the unsuffixed locale name is usually just an alias for one
of these.

> I am not able to reinitdb, as many other databases are running, which might
> be affected negatively. This means, that even though my database is created
> WITH ENCODING 'unicode', it is in fact "broken" as the locale does not
> fully support unicode string handling?

Yes. If you can't reinitdb, then you should recreate the database with
encoding LATIN1. This won't allow all Unicode characters, obviously, but at
least you get proper behavior for the Danish characters that you need.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2008-07-15 14:36:21 Re: C-procedure crashed in Postgres 8.3.3 when using 'text' variable (WinXP)
Previous Message Josh Berkus 2008-07-15 14:18:10 Re: [pgsql-advocacy] Pg booth staffing at OSCON