Skip site navigation (1) Skip section navigation (2)

Re: UTF-8 data migration problem in Postgresql 7.2

From: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
To: jm(dot)poure(at)freesurf(dot)fr
Cc: pgsql-hackers(at)postgresql(dot)org, pgsql-odbc(at)postgresql(dot)org,Inoue(at)tpf(dot)co(dot)jp
Subject: Re: UTF-8 data migration problem in Postgresql 7.2
Date: 2002-02-21 09:31:54
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-hackerspgsql-odbc
> > o Were server/clien encodings UTF-8 for PostgreSQL?
> Yes.
> > o What are versions of these softwares? Especially of PHP? Is it a
> > PHP4? if so, what version? What is the "Php with UTF-8 extensions"?
> > I've never heard of it.
> It is PHP 4.0.6 with :
> --enable-mbstring : Enable mbstring functions. This option is required to use 
> mbstring functions. 
> --enable-mbstr-enc-trans : Enable HTTP input character encoding conversion 
> using mbstring conversion engine. If this feature is enabled, HTTP input 
> character encoding may be converted to mbstring.internal_encoding 
> automatically. 

Oh, that's a general functionality for handling multibyte characters,
not only for UTF-8. What are settings for mbstring in php.ini?
(entries begin with "mbstring.")

BTW, PHP4.0.6 is very buggy when used with PostgreSQL (random
crashes). I recomend to upgrade to 4.1.1.

> Now, some more information:
> 1) Dutch text was entered using IE5.5. It is not faulty.

I assume the web page's encoding was UTF-8.

> 2) Japanese text was entered using OpenOffice latest release (sorry, I said 
> IE5 but I was wrong), saved under UTF-8 and imported in PostgreSQL. Only 
> Japanese data has problems. 

Can I take a look at the UTF-8 text generated by OpenOffice?

> 3) When opening a faulty Japanese record using Apache/IE5, the record is 
> displayed correctly. Each faulty character is replaced by a Japanese 30A7 
> gryph (looks like a French cross with two horizontal lines). What is this 
> gryph? Does it mean 'I don't know' in Japanese.

What do you mean by "gryph"? Is 30A7 is an EUC-JP?

> The record is saved correctly using this 30A1 gryph (then it looks like it is 
> fixed as I can dump it and import it in 7.2, but this is not a solution).

Again, what is "gryph"?

> 4) In PostgreSQL 7.1.3 original dump, there is only one faulty UTF-8 
> character repeated 700 times. If you open my file in Yudit, it is displayed 
> as =E3=82' Why is it always the same character everywhere? Maybe you could 
> have a look at my source file again. Sounds like a bug (Open Office or 
> PostgreSQL).
> 5) Surrogate pairs
> I heard PostgreSQL did not support surrogate pairs. Is this a problem of 
> surrogate pair? Just my 0.02 cents, I know very little about UTF-8.

I don't think so.
Tatsuo Ishii

In response to

pgsql-hackers by date

Next:From: Cyril SamovskiyDate: 2002-02-21 09:39:22
Subject: Re: [HACKERS] foreign key is from different tables - what to do?
Previous:From: Jean-Michel POUREDate: 2002-02-21 09:13:23
Subject: Re: UTF-8 data migration problem in Postgresql 7.2

pgsql-odbc by date

Next:From: Dave PageDate: 2002-02-21 14:13:06
Subject: Re: ADO Max Records and Visual Basic
Previous:From: Jean-Michel POUREDate: 2002-02-21 09:13:23
Subject: Re: UTF-8 data migration problem in Postgresql 7.2

Privacy Policy | About PostgreSQL
Copyright © 1996-2018 The PostgreSQL Global Development Group