Recommended approach for upgrading DBs with nonmatching encodings

From: Martin Pitt <martin(at)piware(dot)de>
To: PostgreSQL Bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Recommended approach for upgrading DBs with nonmatching encodings
Date: 2008-03-30 18:40:50
Message-ID: 20080330184050.GM6486@piware.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hello all,

as already known, 8.3 now enforces a match between DB encoding and
server locale [1]. I agree that this is a Good Thing™, but it causes
automatic upgrades from previous versions to 8.3 to fail with
something like

pg_restore: [archiver (db)] could not execute query: ERROR: encoding LATIN1 does not match server's locale it_IT.UTF-8
DETAIL: The server's LC_CTYPE setting requires encoding UTF8.
Command was:
CREATE DATABASE lixpergroupware WITH TEMPLATE = template0 ENCODING = 'LATIN1';

There were some proposals [3][4] to have the server run under locale C
or POSIX, but IMHO this is rather an aggravation than a viable
solution (locales are a good thing), so I rather don't do that.

My gut feeling is that the right approach would be to create all
target (8.3) databases with a correct encoding that matches the server
locale and have the character data from the pg_dump converted on the
fly (with iconv, or with pg_dump itself), so that people get from a
broken 8.1/8.2 setup to a good 8.3 setup painlessly.

I'm seeking some input on (1) whether above approach is the correct
one, and (2) suggestions how to implement it properly.

My current pg_upgradecluster uses pg_dumpall to copy the schema, and a
per-db pg_dump to copy the DB table contents. Will calling pg_dump
with --encoding=<encoding of 8.3 server's locale> always DTRT and is
it meant to solve this problem? The common case is --encoding=UTF-8,
but of course in theory someone might also have it the other way
around, so the upgrade should fail if it encounters an UTF-8 character
which cannot be encoded into an. e. g. LATIN1 character.

Thank you all for any suggestion,

Martin

[1] http://www.nabble.com/BUG--3924:-Create-Database-with-another-encoding-as-the-encoding-from-postgres-td15246051.html
[2] http://bugs.debian.org/472930
[3] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=472930#10
[4] https://bugs.launchpad.net/207779

--
Martin Pitt | http://www.piware.de
Ubuntu Developer (www.ubuntu.com) | Debian Developer (www.debian.org)

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Martin Pitt 2008-03-30 20:39:38 Re: Recommended approach for upgrading DBs with nonmatching encodings
Previous Message Stephan Szabo 2008-03-30 17:51:14 Re: The problem with FULL JOIN