Re: Sequence of characters not supported by psql/pg_dump

From: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
To: sminne(at)allis(dot)fr
Cc: pgsql-bugs(at)postgresql(dot)org, pbittner(at)allis(dot)fr
Subject: Re: Sequence of characters not supported by psql/pg_dump
Date: 2001-02-24 13:05:43
Message-ID: 20010224220543M.t-ishii@sra.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

> Le jeu, 11 jan 2001, Tatsuo Ishii a crit :
> > > I use PostgreSQL 7.0.2 on linux.
> > > The base was set with initdb -E UNICODE.
> > >
> > > I have many Strings with accents (french language).
> > > Some of them aren't supported by queries or pg_dump:
> > >
> > > WORKING EXAMPLE:
> > > DB=# select * from element_attribute where java_lang_string like 'Scholt_s';
> >
> > Are you sure that the letter (LATIN SMALL LETTER E WITH GRAVE) is
> > encoded in UTF-8? It's 2 bytes long and starting with 0xc...
> > --
> > Tatsuo Ishii
>
> Sorry for the delay. We had very strong production constraint, and I had to let it go for a while.
> Well, How can i know about the internal coding of this letter in the database ?
> The texts where inserted from command line inserts. When inserted from JDBC (so it's supposed to be unicode),
> I can't see any difference, even in the generated dump file.
>
> Today I've seen a post about the same problem, that gives a solution to produce a working dump: pg_dump -d $dumpfile.
>
> This command produce this kind of lines (you :
> INSERT INTO "element_texte" VALUES (634,'','Filtration du Plasma','\350');
>
> I noticed that using \xxx notation i can handle special characters in queries from psql command line, too.
> But it doesn't look like unicode coding (cf www.unicode.org/charts and LATIN1-Supplement), as I was expecting.
>
> So, I have the following questions:
> - What kind of code is this ?

Probably ISO 8859-1.

> - can I get the translation chart somewhere ?
> - why isn't it UNICODE ?

Because you didn't input as UTF-8.

> - why do I have to use \xxx code (and not 0x..., or directly the special character like "") from the psql command line,
> although it is supposed to support UNICODE ?
> - why don't we find this kind of characters in the dump file when the -d option is not set ?
> (I assume this cause the restore to fail, and I think it could be considered as a bug).
>
> Thank you for your help

In the releases prior 7.1, you need to input UTF-8 explicitely. 7.1
has the ability that does automatic encoding conversion between ISO
8859-1 and UTF-8. That means, if you type in characters in ISO 8859-1,
PostgreSQL will convert it to UTF-8 then store into the database.
--
Tatsuo Ishii

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2001-02-24 15:56:08 Re: Lock Detection (was: pg_dump failing on LinuxPPC)
Previous Message Stu Coates 2001-02-24 08:03:13 Lock Detection (was: pg_dump failing on LinuxPPC)