Re: UTF-8 -> ISO8859-1 conversion problem

From: Cott Lang <cott(at)internetstaff(dot)com>
To: "J(dot) Michael Crawford" <jmichael(at)gwi(dot)net>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: UTF-8 -> ISO8859-1 conversion problem
Date: 2004-10-30 14:52:31
Message-ID: 1099147950.3571.13.camel@localhost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Thanks for the detailed reply, you've confirmed what I suspected. :)

I guess I have some work to do!

On Fri, 2004-10-29 at 10:19, J. Michael Crawford wrote:
> In my experience, there are just some characters that don't want to be
> converted, even if they appear to be part of the normal 8-bit character
> system. We went to Unicode databases to hold our Latin1 characters because
> of this. There was even a case where the client was cutting and pasting
> ascii text into our database, and it just wouldn't take some of the
> letters, giving the same error you reported.
>
> I'm going to send a more detailed post on the topic, but in general,
> we've found that there are four things that need to be done (four, if
> you're not serving up web pages) for Latin1 characters to work on multiple
> platforms.
>
> 1. Create the database in Unicode so that it will hold anything you
> throw at it.
>
> 2. When importing data, set the encoding in the script that loads the
> data, or if there's no script, use the "SET CLIENT_ENCODING TO (encoding)"
> command. Setting the encoding in a tool like pgManager is not always
> enough. Use this to be sure.
>
> 3. When retrieving data in a java application, the JVM encoding will
> vary from JVM to JVM, and no attempt on our part to change the JVM encoding
> or translate the encoding of the database strings has worked, either to or
> from the database. We spent weeks going through every permutation
> getBytes("ISO-8859-1") and related calls we could find, but to no
> avail. The JVM will tell you it has a new encoding, but postgres will
> return gibberish. You can translate the bytes, or get a translated string,
> but it's all the same garbage. The solution: set the client encoding
> manually through a jdbc prepared statement. Once you set the client
> encoding properly, all seems to be fine:
>
> String DBEncoding = "anEncoding" //use a real encoding, either returned
> from the jvm or explicitly stated
> PreparedStatement statement = dbCon.prepareStatement("SET CLIENT_ENCODING
> TO '" + DBEncoding + "'");
> statement.execute();
>
> 4. If writing html for a web page, make sure the encoding of the web
> page matches the encoding of the strings you're throwing at it. So if you
> have a Linux JVM that has a "UTF-8" encoding, the web page will need the
> html equivalent:
>
> <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
>
> ---
>
> This is likely far more information than you require, but I thought I'd
> add it anyway so that the information is in the archives. It took us
> months to solve our problem, even with help from the postgres community, so
> I at least want the basics to be posted while I get my act together and
> write something with more detail.
>
> - Mike
>
>
> At 12:12 PM 10/29/2004, Cott Lang wrote:
> >ERROR: could not convert UTF-8 character 0x00ef to ISO8859-1
> >
> >Running 7.4.5, I frequently get this error, and ONLY on this particular
> >character despite seeing quite a bit of 8 bit. I don't really follow why
> >it can't be converted, it's the same character (239) in both character
> >sets. Databases are in ISO8859-1, JDBC driver is defaulting to UTF-8.
> >
> >Am I flubbing something up? I'm probably going to (reluctantly) convert
> >to UTF-8 in the database at some point, but it'd sure be nice if this
> >worked without that. :)
> >
> >thanks!
> >
> >
> >
> >
> >
> >
> >
> >---------------------------(end of broadcast)---------------------------
> >TIP 8: explain analyze is your friend
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faqs/FAQ.html

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Tino Wildenhain 2004-10-30 15:23:31 Re: QMail
Previous Message Vinko Vrsalovic 2004-10-30 14:48:17 Re: 8.0 Beta 4 denying network connections?