Re: UNICODE

From: Marko Kreen <marko(at)l-t(dot)ee>
To: Jean-Michel POURE <jm(dot)poure(at)freesurf(dot)fr>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: UNICODE
Date: 2001-10-28 15:09:45
Message-ID: 20011028170945.A18241@l-t.ee
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin pgsql-general pgsql-hackers

On Sun, Oct 28, 2001 at 02:34:49PM +0100, Jean-Michel POURE wrote:
>
> >psql uses your input literally - so is your console/xterm in
> >UNICODE/UTF8?
> Client: \encoding returns 'UNICODE'.
> Server: \list show databases. All databases are UNICODE (except TEMPLATE0
> and TEMPLATE1 which are ASCII of course). I use a Mandrake 8.1 distribution
> and think my console is UNICODE.

You think? Try this:

$ echo "accepté" | od -c

If your term is in utf you should get:

0000000 a c c e p t 303 251 \n
0000011

If in iso-8859-1:

0000000 a c c e p t 351 \n
0000010

It may be in some other 8bit encoding too, then the last number
may be different.

> >> As for me, I typed INSERT INTO source_content VALUES ('Permis de conduire
> >> accepté') in Psql.
> >As I said - psql does not do any conversion.
> The faulty query is: INSERT INTO test (source_content) VALUES ('Permis de
> conduire accepté');

Hmm. It may be a bug in input routines. You give PostgreSQL a
1byte 'é', it expects 2 byte char and overflows somewhere. Can
you reproduce it on 7.1.3? Maybe its fixed there, I cant
reproduce it.

> I just can't believe that Psql is not UTF-8 compatible. It seems unreal as
> Psql is PostgreSQL #1 helper application. Should I use PostgreSQL MULE
> encoding to have automatic trans coding. What are the guidelines, I am
> completely lost.

psql & pg_dump are fine. Your problem is that you dont give to
psql and pg_exec/PHP utf-8 strings, but some iso-8859-*.

> >> Psql does not insert the data and I have to kill it manually. Can you
> >> reproduce this?
> >No. If it hangs this is serious problem. Or did you simply
> >forgot final ';' ? It btw does not seem valid sql to me,
> >considering you previously provided table structure.
> Is it possible that my database is corrupted? I have used pg_dump several
> times to dump data from production server to development servers and
> conversely. Does pg_dump produce UTF8 output? What are the guidelines when
> using UTF-8: forget psql and pg_dump?

As I said, psql & pg_dump are fine, they do not touch your data
when it passes through them.

It may be that all of your database is in latin1, as you
inserted strings in this encoding, not utf8. Basically
PostgreSQL server also does not touch your data, only its
compare functions does not work, as the strings are not in
encoding you tell they are.

Solution to this is to dump your data, use the iconv utility
to convert it to utf8 and reload.

To see this you should do:

$ psql -c "SELECT source_contect FROM table where ..." \
| od -c

And then look whether the weird characters are represented in
1 or 2 bytes.

--
marko

In response to

  • Re: UNICODE at 2001-10-28 13:34:49 from Jean-Michel POURE

Responses

  • Re: UNICODE at 2001-10-28 15:37:48 from Jean-Michel POURE

Browse pgsql-admin by date

  From Date Subject
Next Message Jean-Michel POURE 2001-10-28 15:37:48 Re: UNICODE
Previous Message Emmanuel Guyot 2001-10-28 13:43:17 Re: pg_dump and timestamp : problem with TimeZone

Browse pgsql-general by date

  From Date Subject
Next Message The Cadaver 2001-10-28 15:34:45 IDE
Previous Message Gunnar Lindholm 2001-10-28 13:52:50 trigger function in plpgsql (newbie)

Browse pgsql-hackers by date

  From Date Subject
Next Message Jean-Michel POURE 2001-10-28 15:37:48 Re: UNICODE
Previous Message mlw 2001-10-28 14:53:45 Query planner, 7.2b1 select ... order by