8.3.0, locales, and encodings

From: "Dean Gibson (DB Administrator)" <postgresql(at)ultimeth(dot)com>
To: pgsql-admin(at)postgresql(dot)org
Subject: 8.3.0, locales, and encodings
Date: 2008-02-07 02:29:02
Message-ID: 47AA6CEE.4010406@ultimeth.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

I got a small surprise when upgrading to 8.3.0 today (CentOS 4.4). When
I went to restore my database, PSQL complained that my "CREATE
DATABASE... ENCODING 'UTF8';" wasn't valid (the new phpBB 3.0 software
enforces using UTF8). Sure enough, a bit of research showed that the
supplied /etc/init.d/postgresql file had no encoding specified, which
meant that it defaulted to my operating system's 'en_US.ISO-8859-1'
encoding. A quick reading of the PostgreSQL documentation revealed that
PostgreSQL can't really use an encoding different from that specified
during INITDB (caveat below), and apparently that is now enforced in v8.3.

OK, so I manually run INITDB w/ "-E UTF8", and now I can create/load one
of my databases. Unfortunately, when I go to create another database
(where the data is 'ISO-8859-1'), now IT won't create. A more detailed
reading of the PostgreSQL documentation revealed that PostgreSQL can't
use an encoding different from that specified during INITDB, UNLESS
INITDB is run w/ "-locale POSIX".

Ok, I rerun INITDB w/ "-locale POSIX -E UTF8", and now both databases
create/load, apparently properly. I also did an INITDB on another
database server, running INITDB w/ just "-locale POSIX" (apparently
meaning that the "template1" encoding defaulted to "SQL_ASCII"), and
successfully create/reload another database there (explicitly specifying
an ENCODING there).

Questions/suggestions:

1. The provided /etc/init.d/postgresql file seems to be part of this
trap. Should it be changed to default the encoding to POSIX?
2. I want my installation to have the maximum flexibility. Should I be
running INITDB w/ "-locale POSIX -E UTF8", or just "-locale POSIX"
(implying "-E SQL_ASCII")? Or does it make a difference, since I always
specify the desired database encoding when I "CREATE DATABASE...
ENCODING '...';"?
3. One of my databases is ISO-8859-1, because that's the format of the
data I get from the US gov't. However, should I instead use the 'UTF8'
encoding for that database, and then use a client (PSQL) encoding of
"LATIN1", and let PostgreSQL convert the data? Pros & cons?

Performance is a concern, but disk space is not.

I presume this ("admin") is the proper PostgreSQL list to post this to.
If not, let me know.

-- Dean Gibson

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Mag Gam 2008-02-07 03:52:51 Re: rename database
Previous Message antonio freitas 2008-02-07 00:54:56 password management