Unicode database question

From: Lynna Landstreet <lynna(at)gallery44(dot)org>
To: <pgsql-general(at)postgresql(dot)org>
Subject: Unicode database question
Date: 2003-07-16 23:39:37
Message-ID: BB3B5A79.5A0%lynna@gallery44.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello,

I'm running into a bit of trouble with a Unicode-enabled PostgreSQL database
(some of the data consists of artist and/or image names in other languages,
like French, Spanish, German and Portuguese, which frequently have accents,
and I don't want people entering data to have to use ASCII codes). Having (I
thought) managed to get past the issues of exporting text as Unicode in
order to import it into the database and uploading the text files as binary
instead of data to keep them Unicode/UTF-8 as I upload them, and then using
psql's \copy command to insert the data into the database, I can't get the
special characters to display properly on the web. :-(

I'm not even sure how to tell if the problem is on the input side or the
output side - as in, whether it's that the data in the database got muddled
on the way in and is not valid Unicode, or whether it's OK but every means I
try to use to view it doesn't want to accept Unicode. I'm pretty sure the
text files got to the server OK as Unicode, because I was able to view them
directly with a web browser and the special characters were OK then. But
when I imported them into the database, I was not then able to view the
special characters correctly, either in my browser through the PHP frontend
I'm developing for the database or phpPgAdmin, or via Telnet/SSH. So I don't
know if the problem came about somehow while using \copy to import them, or
with the means I'm using to view them.

I've set the charset encoding of my PHP pages to UTF-8, and the default
encoding in my browser as well, but that doesn't help. And I've tried
editing the data through phpPgAdmin to restore the special characters, but
got the following error message:

Error - /[path to my web directory]/phpPgAdmin/tbl_replace.php -- Line: 77

PostgreSQL said: ERROR: Invalid UNICODE character sequence found (0xe7e36f)
Your query:
UPDATE "artists" SET "artist_id" = 485, "firstname" = 'Teresa', "lastname" =
'Ascenção'... [rest of query deleted]

Ironically, the accented characters in her last name (a c with a cedilla and
an a with a tilde, in case they don't show up here) displayed fine in the
error message! But it wouldn't enter them into the database.

Questions that come to mind:

1. Does anyone have any idea what's going wrong here?
2. Can \copy reduce UTF-8 text to plain ASCII while importing data from a
text file?
3. If so, can it be made not to, maybe through adding some kind of parameter
to the command? Or is there a better way to import the data?
4. Is if correct for the database encoding to be "UNICODE" or should it be
UTF-8 specifically? My impression thus far was that Unicode and UTF-8 were
more or less the same thing, but maybe more or less isn't good enough.
5. Does a web form have to be specially coded to accept text with accented
characters into a database, or does the encoding of the database itself
and/or the web page the form is on determine that?

Any assistance would be much appreciated...

Lynna
--
Resource Centre Database Coordinator
Gallery 44
www.gallery44.org

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Jim Crate 2003-07-16 23:55:36 Re: IPv4 addresses, unsigned integers, space
Previous Message nolan 2003-07-16 23:39:12 Re: dump_all/restore times?