Skip site navigation (1) Skip section navigation (2)

Re: UTF-8 data migration problem in Postgresql 7.2

From: Jean-Michel POURE <jm(dot)poure(at)freesurf(dot)fr>
To: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
Cc: pgsql-hackers(at)postgresql(dot)org, pgsql-odbc(at)postgresql(dot)org,Inoue(at)tpf(dot)co(dot)jp
Subject: Re: UTF-8 data migration problem in Postgresql 7.2
Date: 2002-02-21 09:13:23
Message-ID: 200202210913.g1L9DNFP032755@www1.translationforge (view raw or flat)
Thread:
Lists: pgsql-hackerspgsql-odbc
Dear Tatsuo,

Thank you for your previous answer.

> o Were server/clien encodings UTF-8 for PostgreSQL?
Yes.

> o What are versions of these softwares? Especially of PHP? Is it a
> PHP4? if so, what version? What is the "Php with UTF-8 extensions"?
> I've never heard of it.
It is PHP 4.0.6 with :
--enable-mbstring : Enable mbstring functions. This option is required to use 
mbstring functions. 
--enable-mbstr-enc-trans : Enable HTTP input character encoding conversion 
using mbstring conversion engine. If this feature is enabled, HTTP input 
character encoding may be converted to mbstring.internal_encoding 
automatically. 

Now, some more information:
1) Dutch text was entered using IE5.5. It is not faulty.

2) Japanese text was entered using OpenOffice latest release (sorry, I said 
IE5 but I was wrong), saved under UTF-8 and imported in PostgreSQL. Only 
Japanese data has problems. 

3) When opening a faulty Japanese record using Apache/IE5, the record is 
displayed correctly. Each faulty character is replaced by a Japanese 30A7 
gryph (looks like a French cross with two horizontal lines). What is this 
gryph? Does it mean 'I don't know' in Japanese.

The record is saved correctly using this 30A1 gryph (then it looks like it is 
fixed as I can dump it and import it in 7.2, but this is not a solution).

4) In PostgreSQL 7.1.3 original dump, there is only one faulty UTF-8 
character repeated 700 times. If you open my file in Yudit, it is displayed 
as =E3=82' Why is it always the same character everywhere? Maybe you could 
have a look at my source file again. Sounds like a bug (Open Office or 
PostgreSQL).

5) Surrogate pairs
I heard PostgreSQL did not support surrogate pairs. Is this a problem of 
surrogate pair? Just my 0.02 cents, I know very little about UTF-8.

Any help appreciated,
Thanks, Jean-Michel POURE


In response to

Responses

pgsql-hackers by date

Next:From: Tatsuo IshiiDate: 2002-02-21 09:31:54
Subject: Re: UTF-8 data migration problem in Postgresql 7.2
Previous:From: Karel ZakDate: 2002-02-21 09:06:22
Subject: Re: elog() proposal

pgsql-odbc by date

Next:From: Tatsuo IshiiDate: 2002-02-21 09:31:54
Subject: Re: UTF-8 data migration problem in Postgresql 7.2
Previous:From: Tatsuo IshiiDate: 2002-02-21 04:04:58
Subject: Re: UTF-8 data migration problem in Postgresql 7.2

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group