Re: Encoding problems with migration from 8.0.14 to 8.3.0 on Windows

From: Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>
To: pgsql-admin(at)postgresql(dot)org, meetesh(dot)karia(at)alumni(dot)duke(dot)edu
Subject: Re: Encoding problems with migration from 8.0.14 to 8.3.0 on Windows
Date: 2008-03-13 03:28:42
Message-ID: 200803122328.42683.xzilla@users.sourceforge.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin pgsql-hackers

On Wednesday 12 March 2008 09:37, Meetesh Karia wrote:
> One quick addition to this:
>
> The column I'm creating this unique index on is a varchar(255) and the
> command I was running was:
>
> create unique index foo_name on foo (name);
>
> If I use the following, it now works:
>
> create unique index foo_name on foo (cast(name as bytea));
>
> Thoughts?
>
> Meetesh
>
> Meetesh Karia wrote:
> > Hi all,
> >
> > I'm trying to migrate from 8.0.14 on Windows (Vista Home Premium) to
> > 8.3.0 and I've been trying to solve what appears to be an encoding
> > problem. My old db was in the UNICODE encoding. I know that this
> > isn't supported on 8.0.x, but it was a restore of a db from a Linux
> > environment and postgres didn't appear to have any problems with it.
> >
> > My 8.3 server and client encodings are UTF8 and I used pg_dumpall (I
> > tried the 8.0 and 8.3 versions) to dump the db. However, when I tried
> > to restore the db, I got an error during index creation which wouldn't
> > let me create a unique index on a column that had all unique values
> > (it had the index in 8.0 and a group by having query with no indexes
> > on the table confirms uniqueness). The thing that this column does
> > have however is values like:
> >
> > 'Bruehl'
> > 'Brühl'
> >
> > I created a blank table with the unique index on it and inserted rows
> > one at a time until I confirmed that it was the above values that were
> > causing a problem. Running the following query shows the difference
> > in the hex encoded values (I changed my client encoding to WIN1250 to
> > get the below to show up correctly):
> >
> > select name, encode(decode(name, 'escape'), 'hex') from ...
> >
> > name | encode
> > ---------------+----------------------------
> > Daniel Brühl | 44616e69656c204272c3bc686c
> > Daniel Bruehl | 44616e69656c2042727565686c
> > (2 rows)
> >
> > I've also tried exporting using an encoding of WIN1250 but I get
> > errors like this:
> >
> > pg_dump: Error message from server: ERROR: character 0xc383 of
> > encoding "UNICODE" has no equivalent in "WIN1250"
> >
> > Anyone have any thoughts or suggestions? Why would the index creation
> > fail? Is there a workaround?
> >

I'm not convinced your problem isn't solved by proper setting of
client_encoding for both input and output:

pagila=# create table x (r varchar(255) unique);
NOTICE: CREATE TABLE / UNIQUE will create implicit index "x_r_key" for
table "x "
CREATE TABLE
pagila=# set client_encoding=WIN1250;
SET
pagila=# insert into x (r) values ('Daniel Brühl');
INSERT 0 1
pagila=# insert into x (r) values ('Daniel Bruehl');
INSERT 0 1
pagila=# select * from x;
r
---------------
Daniel Brühl
Daniel Bruehl
(2 rows)

--
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Scott Marlowe 2008-03-13 05:50:21 Re: migration of 7.4 to 8.1
Previous Message Robert Treat 2008-03-13 03:17:59 Re: migration of 7.4 to 8.1

Browse pgsql-hackers by date

  From Date Subject
Next Message Webb Sprague 2008-03-13 04:46:07 Re: Ideas input sought for this year's SOC page
Previous Message Dann Corbit 2008-03-13 01:05:55 TIMESTAMP and daylight savings time question