Skip site navigation (1) Skip section navigation (2)

Re: Encoding problems with migration from 8.0.14 to 8.3.0 on Windows

From: Meetesh Karia <meetesh(dot)karia(at)gmail(dot)com>
To: meetesh(dot)karia(at)alumni(dot)duke(dot)edu
Cc: Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>, pgsql-admin(at)postgresql(dot)org
Subject: Re: Encoding problems with migration from 8.0.14 to 8.3.0 on Windows
Date: 2008-03-13 12:01:59
Message-ID: 47D917B7.3050106@gmail.com (view raw or flat)
Thread:
Lists: pgsql-adminpgsql-hackers
Thanks to help from Sergey Benner, I was able to narrow the problem 
down.  When I first initialized the PG cluster, I used the following 
command:

initdb -D d:/postgres/8.3/data -E UTF8 -U postgres

And that set my LC_COLLATE and LC_CTYPE values to 'English_United 
States.1252'.  'en_US-UTF-8' would be ideal for me but that doesn't seem 
to exist on Windows.

Anyhow, when I recreated the PG cluster using this command:

initdb -D ./data -E UTF8 --locale=C -U postgres

It now correctly handles the values and allows me to create the unique 
index.  I can use the C locale, but I'm a bit surprised that the default 
windows locale wouldn't work.

Is that to be expected?

Thanks,
Meetesh

Meetesh Karia wrote:
> Thanks for your response Robert.
>
> Unfortunately I don't think that will work (unless I'm 
> misunderstanding something).
>
> I can't export using a client encoding of WIN1250 because I have true 
> UNICODE chars which can't be represented with it (I've tried and I get 
> the error below).  Therefore, I have a sql dump with UNICODE chars and 
> setting the client encoding to WIN1250 when I run the sql dump script 
> would toast the characters.
>
> Additionally, here's what I get when I run your test below (my server 
> encoding is UTF-8):
>
> ltefull=# create table x (r varchar(255) unique);
> NOTICE:  CREATE TABLE / UNIQUE will create implicit index "x_r_key" 
> for table "x"
> CREATE TABLE
> ltefull=#
> ltefull=# set client_encoding=WIN1250;
> SET
> ltefull=# insert into x (r) values ('Daniel Brühl');
> INSERT 0 1
> ltefull=#
> ltefull=# insert into x (r) values ('Daniel Bruehl');
> ERROR:  duplicate key value violates unique constraint "x_r_key"
>
> Meetesh
>
> Robert Treat wrote:
>> On Wednesday 12 March 2008 09:37, Meetesh Karia wrote:
>>   
>>> One quick addition to this:
>>>
>>> The column I'm creating this unique index on is a varchar(255) and the
>>> command I was running was:
>>>
>>> create unique index foo_name on foo (name);
>>>
>>> If I use the following, it now works:
>>>
>>> create unique index foo_name on foo (cast(name as bytea));
>>>
>>> Thoughts?
>>>
>>> Meetesh
>>>
>>> Meetesh Karia wrote:
>>>     
>>>> Hi all,
>>>>
>>>> I'm trying to migrate from 8.0.14 on Windows (Vista Home Premium) to
>>>> 8.3.0 and I've been trying to solve what appears to be an encoding
>>>> problem.  My old db was in the UNICODE encoding.  I know that this
>>>> isn't supported on 8.0.x, but it was a restore of a db from a Linux
>>>> environment and postgres didn't appear to have any problems with it.
>>>>
>>>> My 8.3 server and client encodings are UTF8 and I used pg_dumpall (I
>>>> tried the 8.0 and 8.3 versions) to dump the db.  However, when I tried
>>>> to restore the db, I got an error during index creation which wouldn't
>>>> let me create a unique index on a column that had all unique values
>>>> (it had the index in 8.0 and a group by having query with no indexes
>>>> on the table confirms uniqueness).  The thing that this column does
>>>> have however is values like:
>>>>
>>>> 'Bruehl'
>>>> 'Brühl'
>>>>
>>>> I created a blank table with the unique index on it and inserted rows
>>>> one at a time until I confirmed that it was the above values that were
>>>> causing a problem.  Running the following query shows the difference
>>>> in the hex encoded values (I changed my client encoding to WIN1250 to
>>>> get the below to show up correctly):
>>>>
>>>> select name, encode(decode(name, 'escape'), 'hex') from ...
>>>>
>>>>      name      |           encode
>>>> ---------------+----------------------------
>>>>  Daniel Brühl  | 44616e69656c204272c3bc686c
>>>>  Daniel Bruehl | 44616e69656c2042727565686c
>>>> (2 rows)
>>>>
>>>> I've also tried exporting using an encoding of WIN1250 but I get
>>>> errors like this:
>>>>
>>>> pg_dump: Error message from server: ERROR:  character 0xc383 of
>>>> encoding "UNICODE" has no equivalent in "WIN1250"
>>>>
>>>> Anyone have any thoughts or suggestions?  Why would the index creation
>>>> fail?  Is there a workaround?
>>>>
>>>>       
>>
>> I'm not convinced your problem isn't solved by proper setting of 
>> client_encoding for both input and output:
>>
>> pagila=# create table x (r varchar(255) unique);
>> NOTICE:  CREATE TABLE / UNIQUE will create implicit index "x_r_key" for 
>> table "x "
>> CREATE TABLE
>> pagila=# set client_encoding=WIN1250;
>> SET
>> pagila=# insert into x (r) values ('Daniel Brühl');
>> INSERT 0 1
>> pagila=# insert into x (r) values ('Daniel Bruehl');
>> INSERT 0 1
>> pagila=# select * from x;
>>        r
>> ---------------
>>  Daniel Brühl
>>  Daniel Bruehl
>> (2 rows)
>>
>>   

In response to

pgsql-hackers by date

Next:From: Alvaro HerreraDate: 2008-03-13 12:44:37
Subject: Re: Reducing Transaction Start/End Contention
Previous:From: Pavan DeolaseeDate: 2008-03-13 11:40:08
Subject: Re: Nasty bug in heap_page_prune

pgsql-admin by date

Next:From: Julio LeyvaDate: 2008-03-13 12:46:56
Subject: Re: LAN connection to server only after restart server
Previous:From: Meetesh KariaDate: 2008-03-13 10:00:41
Subject: Re: Encoding problems with migration from 8.0.14 to 8.3.0 on Windows

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group