Database encoding and locale

From: "BRUSSER Michael" <Michael(dot)BRUSSER(at)3ds(dot)com>
To: <pgsql-general(at)postgresql(dot)org>
Subject: Database encoding and locale
Date: 2010-12-06 21:04:44
Message-ID: 29EA9BFEF7E7FC4F988818CA0C5C78C20564A7@AG-DCC-MBX03.dsone.3ds.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I would appreciate some pointers on using database encoding and locale.

This is the error message I get from initdb, on Sun Solaris 5.10:
initdb: encoding mismatch
The encoding you selected (UTF8) and the encoding that the
selected locale uses (LATIN1) do not match. This would lead to
misbehavior in various character string processing functions.
Rerun initdb and either do not specify an encoding explicitly,
or choose a matching combination.

In the database setup log I see this:
The database cluster will be initialized with locales
COLLATE: en_US.ISO8859-1
CTYPE: en_US.ISO8859-1
MESSAGES: C
MONETARY: en_US.ISO8859-1
NUMERIC: en_US.ISO8859-1
TIME: en_US.ISO8859-1

Checking on the environment:
% locale
LANG=
LC_CTYPE=en_US.ISO8859-1
LC_NUMERIC=en_US.ISO8859-1
LC_TIME=en_US.ISO8859-1
LC_COLLATE=en_US.ISO8859-1
LC_MONETARY=en_US.ISO8859-1
LC_MESSAGES=C
LC_ALL=

I don't understand why initdb could not work with UTF8 and the given locale. The error message suggests
'... do not specify an encoding explicitly' but this is what I did.
Setting env LANG did not help. I am obviously missing something here.

In the old days we only used '-E UNICODE' with initdb, now with v8.4.4 I've changed it to '-E UTF8'
but I am not quite sure what to do about locale. If I provide an explicit value --locale=en_US.UTF-8
initdb succeeds, but this may not be the best option for installations outside of US, which brings the second question:
How to initialize the database if user's environment is not known upfront?
Much of the data on the database will be in local language, whether its French, Dutch, English, or something else.
Some data will be always in English.
Could I get away with something generic like C or POSIX?

We run on few UNIX platforms, I guess 'locale -a' is always available, but my first choice would be not
to define it dynamically, and not to prompt user for this parameter. Not sure if I'm on the right track...

Thanks,
Michael.

This email and any attachments are intended solely for the use of the individual or entity to whom it is addressed and may be confidential and/or privileged. If you are not one of the named recipients or have received this email in error, (i) you should not read, disclose, or copy it, (ii) please notify sender of your receipt by reply email and delete this email and all attachments, (iii) Dassault Systemes does not accept or assume any liability or responsibility for any use of or reliance on this email.For other languages, go to http://www.3ds.com/terms/email-disclaimer.

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2010-12-06 21:09:04 Re: Do we want SYNONYMS?
Previous Message Tom Lane 2010-12-06 21:01:12 Re: Do we want SYNONYMS?