Skip site navigation (1) Skip section navigation (2)

Re: UTF-8 -> ISO8859-1 conversion problem

From: "J(dot) Michael Crawford" <jmichael(at)gwi(dot)net>
To: Cott Lang <cott(at)internetstaff(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: UTF-8 -> ISO8859-1 conversion problem
Date: 2004-10-29 17:47:37
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-general
  Correction:  Four things that need to be done, THREE if you're not 
serving up html.  Sorry for the editing error.

         - Mike

At 01:19 PM 10/29/2004, J. Michael Crawford wrote:

>   In my experience, there are just some characters that don't want to be 
> converted, even if they appear to be part of the normal 8-bit character 
> system.  We went to Unicode databases to hold our Latin1 characters 
> because of this.  There was even a case where the client was cutting and 
> pasting ascii text into our database, and it just wouldn't take some of 
> the letters, giving the same error you reported.
>   I'm going to send a more detailed post on the topic, but in general, 
> we've found that there are four things that need to be done (four, if 
> you're not serving up web pages) for Latin1 characters to work on 
> multiple platforms.
>   1.  Create the database in Unicode so that it will hold anything you 
> throw at it.
>   2.  When importing data, set the encoding in the script that loads the 
> data, or if there's no script, use the "SET CLIENT_ENCODING TO 
> (encoding)" command.  Setting the encoding in a tool like pgManager is 
> not always enough.  Use this to be sure.
>   3.  When retrieving data in a java application, the JVM encoding will 
> vary from JVM to JVM, and no attempt on our part to change the JVM 
> encoding or translate the encoding of the database strings has worked, 
> either to or from the database.  We spent weeks going through every 
> permutation getBytes("ISO-8859-1") and related calls we could find, but 
> to no avail.  The JVM will tell you it has a new encoding, but postgres 
> will return gibberish.  You can translate the bytes, or get a translated 
> string, but it's all the same garbage.  The solution: set the client 
> encoding manually through a jdbc prepared statement.  Once you set the 
> client encoding properly, all seems to be fine:
>String DBEncoding = "anEncoding"  //use a real encoding, either returned 
>from the jvm or explicitly stated
>PreparedStatement statement = dbCon.prepareStatement("SET CLIENT_ENCODING 
>TO '" + DBEncoding + "'");
>   4.  If writing html for a web page, make sure the encoding of the web 
> page matches the encoding of the strings you're throwing at it.  So if 
> you have a Linux JVM that has a "UTF-8" encoding, the web page will need 
> the html equivalent:
><meta http-equiv="Content-Type" content="text/html; charset=utf-8">
>   This is likely far more information than you require, but I thought I'd 
> add it anyway so that the information is in the archives.  It took us 
> months to solve our problem, even with help from the postgres community, 
> so I at least want the basics to be posted while I get my act together 
> and write something with more detail.
>         - Mike
>At 12:12 PM 10/29/2004, Cott Lang wrote:
> >ERROR: could not convert UTF-8 character 0x00ef to ISO8859-1
> >
> >Running 7.4.5, I frequently get this error, and ONLY on this particular
> >character despite seeing quite a bit of 8 bit. I don't really follow why
> >it can't be converted, it's the same character (239) in both character
> >sets. Databases are in ISO8859-1, JDBC driver is defaulting to UTF-8.
> >
> >Am I flubbing something up?  I'm probably going to (reluctantly) convert
> >to UTF-8 in the database at some point, but it'd sure be nice if this
> >worked without that. :)
> >
> >thanks!
> >
> >
> >
> >
> >
> >
> >
> >---------------------------(end of broadcast)---------------------------
> >TIP 8: explain analyze is your friend
>---------------------------(end of broadcast)---------------------------
>TIP 5: Have you checked our extensive FAQ?

In response to

pgsql-general by date

Next:From: Tom LaneDate: 2004-10-29 17:53:28
Subject: Re: tasks
Previous:From: Bruno Wolff IIIDate: 2004-10-29 17:46:20
Subject: Re: Comment on timezone and interval types

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group