Re: Encoding problems with migration from 8.0.14 to 8.3.0 on Windows

From: Meetesh Karia <meetesh(dot)karia(at)gmail(dot)com>
To: Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>
Cc: pgsql-admin(at)postgresql(dot)org, meetesh(dot)karia(at)alumni(dot)duke(dot)edu
Subject: Re: Encoding problems with migration from 8.0.14 to 8.3.0 on Windows
Date: 2008-03-13 10:00:41
Message-ID: 47D8FB49.6000406@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin pgsql-hackers

Thanks for your response Robert.

Unfortunately I don't think that will work (unless I'm misunderstanding
something).

I can't export using a client encoding of WIN1250 because I have true
UNICODE chars which can't be represented with it (I've tried and I get
the error below). Therefore, I have a sql dump with UNICODE chars and
setting the client encoding to WIN1250 when I run the sql dump script
would toast the characters.

Additionally, here's what I get when I run your test below (my server
encoding is UTF-8):

ltefull=# create table x (r varchar(255) unique);
NOTICE: CREATE TABLE / UNIQUE will create implicit index "x_r_key" for
table "x"
CREATE TABLE
ltefull=#
ltefull=# set client_encoding=WIN1250;
SET
ltefull=# insert into x (r) values ('Daniel Brühl');
INSERT 0 1
ltefull=#
ltefull=# insert into x (r) values ('Daniel Bruehl');
ERROR: duplicate key value violates unique constraint "x_r_key"

Meetesh

Robert Treat wrote:
> On Wednesday 12 March 2008 09:37, Meetesh Karia wrote:
>
>> One quick addition to this:
>>
>> The column I'm creating this unique index on is a varchar(255) and the
>> command I was running was:
>>
>> create unique index foo_name on foo (name);
>>
>> If I use the following, it now works:
>>
>> create unique index foo_name on foo (cast(name as bytea));
>>
>> Thoughts?
>>
>> Meetesh
>>
>> Meetesh Karia wrote:
>>
>>> Hi all,
>>>
>>> I'm trying to migrate from 8.0.14 on Windows (Vista Home Premium) to
>>> 8.3.0 and I've been trying to solve what appears to be an encoding
>>> problem. My old db was in the UNICODE encoding. I know that this
>>> isn't supported on 8.0.x, but it was a restore of a db from a Linux
>>> environment and postgres didn't appear to have any problems with it.
>>>
>>> My 8.3 server and client encodings are UTF8 and I used pg_dumpall (I
>>> tried the 8.0 and 8.3 versions) to dump the db. However, when I tried
>>> to restore the db, I got an error during index creation which wouldn't
>>> let me create a unique index on a column that had all unique values
>>> (it had the index in 8.0 and a group by having query with no indexes
>>> on the table confirms uniqueness). The thing that this column does
>>> have however is values like:
>>>
>>> 'Bruehl'
>>> 'Brühl'
>>>
>>> I created a blank table with the unique index on it and inserted rows
>>> one at a time until I confirmed that it was the above values that were
>>> causing a problem. Running the following query shows the difference
>>> in the hex encoded values (I changed my client encoding to WIN1250 to
>>> get the below to show up correctly):
>>>
>>> select name, encode(decode(name, 'escape'), 'hex') from ...
>>>
>>> name | encode
>>> ---------------+----------------------------
>>> Daniel Brühl | 44616e69656c204272c3bc686c
>>> Daniel Bruehl | 44616e69656c2042727565686c
>>> (2 rows)
>>>
>>> I've also tried exporting using an encoding of WIN1250 but I get
>>> errors like this:
>>>
>>> pg_dump: Error message from server: ERROR: character 0xc383 of
>>> encoding "UNICODE" has no equivalent in "WIN1250"
>>>
>>> Anyone have any thoughts or suggestions? Why would the index creation
>>> fail? Is there a workaround?
>>>
>>>
>
> I'm not convinced your problem isn't solved by proper setting of
> client_encoding for both input and output:
>
> pagila=# create table x (r varchar(255) unique);
> NOTICE: CREATE TABLE / UNIQUE will create implicit index "x_r_key" for
> table "x "
> CREATE TABLE
> pagila=# set client_encoding=WIN1250;
> SET
> pagila=# insert into x (r) values ('Daniel Brühl');
> INSERT 0 1
> pagila=# insert into x (r) values ('Daniel Bruehl');
> INSERT 0 1
> pagila=# select * from x;
> r
> ---------------
> Daniel Brühl
> Daniel Bruehl
> (2 rows)
>
>

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Meetesh Karia 2008-03-13 12:01:59 Re: Encoding problems with migration from 8.0.14 to 8.3.0 on Windows
Previous Message Pietro 2008-03-13 09:20:07 LAN connection to server only after restart server

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavan Deolasee 2008-03-13 11:40:08 Re: Nasty bug in heap_page_prune
Previous Message Simon Riggs 2008-03-13 08:24:45 Re: CSStorm occurred again by postgreSQL8.2