Re: UTF-8 data migration problem in Postgresql 7.2

From: Jean-Michel POURE <jm(dot)poure(at)freesurf(dot)fr>
To: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
Cc: pgsql-hackers(at)postgresql(dot)org, pgsql-odbc(at)postgresql(dot)org, Inoue(at)tpf(dot)co(dot)jp
Subject: Re: UTF-8 data migration problem in Postgresql 7.2
Date: 2002-02-21 09:13:23
Message-ID: 200202210913.g1L9DNFP032755@www1.translationforge
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-odbc

Dear Tatsuo,

Thank you for your previous answer.

> o Were server/clien encodings UTF-8 for PostgreSQL?
Yes.

> o What are versions of these softwares? Especially of PHP? Is it a
> PHP4? if so, what version? What is the "Php with UTF-8 extensions"?
> I've never heard of it.
It is PHP 4.0.6 with :
--enable-mbstring : Enable mbstring functions. This option is required to use
mbstring functions.
--enable-mbstr-enc-trans : Enable HTTP input character encoding conversion
using mbstring conversion engine. If this feature is enabled, HTTP input
character encoding may be converted to mbstring.internal_encoding
automatically.

Now, some more information:
1) Dutch text was entered using IE5.5. It is not faulty.

2) Japanese text was entered using OpenOffice latest release (sorry, I said
IE5 but I was wrong), saved under UTF-8 and imported in PostgreSQL. Only
Japanese data has problems.

3) When opening a faulty Japanese record using Apache/IE5, the record is
displayed correctly. Each faulty character is replaced by a Japanese 30A7
gryph (looks like a French cross with two horizontal lines). What is this
gryph? Does it mean 'I don't know' in Japanese.

The record is saved correctly using this 30A1 gryph (then it looks like it is
fixed as I can dump it and import it in 7.2, but this is not a solution).

4) In PostgreSQL 7.1.3 original dump, there is only one faulty UTF-8
character repeated 700 times. If you open my file in Yudit, it is displayed
as =E3=82' Why is it always the same character everywhere? Maybe you could
have a look at my source file again. Sounds like a bug (Open Office or
PostgreSQL).

5) Surrogate pairs
I heard PostgreSQL did not support surrogate pairs. Is this a problem of
surrogate pair? Just my 0.02 cents, I know very little about UTF-8.

Any help appreciated,
Thanks, Jean-Michel POURE

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuo Ishii 2002-02-21 09:31:54 Re: UTF-8 data migration problem in Postgresql 7.2
Previous Message Karel Zak 2002-02-21 09:06:22 Re: elog() proposal

Browse pgsql-odbc by date

  From Date Subject
Next Message Tatsuo Ishii 2002-02-21 09:31:54 Re: UTF-8 data migration problem in Postgresql 7.2
Previous Message Tatsuo Ishii 2002-02-21 04:04:58 Re: UTF-8 data migration problem in Postgresql 7.2