Skip site navigation (1) Skip section navigation (2)

Re: Problem with utf8 encoding

From: Andrew McMillan <andrew(at)morphoss(dot)com>
To: Jorge Miranda Castañeda <jmirandac(dot)85(at)gmail(dot)com>
Cc: pgsql-php(at)postgresql(dot)org
Subject: Re: Problem with utf8 encoding
Date: 2009-12-03 09:35:14
Message-ID: 1259832914.8024.823.camel@happy.home.mcmillan.net.nz (view raw or flat)
Thread:
Lists: pgsql-php
On Thu, 2009-12-03 at 02:00 -0500, Jorge Miranda Castañeda wrote:
> Hello everyone!
> 
> 
> I'm working in a project using postgres, propel, and php.
> 
> 
> My development environment is:
> SO: Windows vista Business SP2
> Postgres: Postgres v8.4
> Propel: Propel generator/runtime v1.4
> PHP: PHP v5.3
> 
> 
> Currently I'm struggling with a problem caused by the encoding.
> Everytime I try to insert a row into the table CURRENCY, which has ID,
> DESC, and SYMBOL as its columns, I get the following error:
> Unable to execute INSERT statement. [wrapped: SQLSTATE[22021]:
> Character not in repertoire: 7 ERROR: invalid byte sequence for
> encoding "UTF8": 0x80 HINT: This error can also happen if the byte
> sequence does not match the encoding expected by the server, which is
> controlled by "client_encoding".]
> 
> 
> I've created the database using this sentence:
> CREATE DATABASE sbs
>   WITH OWNER = sbsadmin
>        ENCODING = 'UTF8'
>        LC_COLLATE = 'Spanish_Peru.1252'
>        LC_CTYPE = 'Spanish_Peru.1252'
>        CONNECTION LIMIT = -1;

Hola Jorge,

I suspect it's the LC_COLLATE and LC_CTYPE that you have there. I don't
*know* this, but they *look* like they are some weird sort of
collation/ctype based on the misguided Windows-1252 encoding.  Sadly,
Windows provides data in this encoding into web forms where the accept
charset is supposedly only ISO-8859.

In Windows-1252 the Euro currency symbol is somewhere in the 0x80 - 0x9f
range - possibly it is 0x80, in fact.

I think you would be better to use a consistent locale like es_PE.UTF-8
though if your data is 1252 encoded then you might need to iconv it
first.

If you have data which is a mix of ISO-8859-1, Windows-1252 and UTF-8
then I can point you at a wee bit of PHP code I wrote which will look at
each character in a string and only iconv from 8859/1252 -> UTF-8 if it
is a high-bit byte which is not part of a valid UTF-8 character already.

The code is here:

 http://repo.or.cz/w/awl.git/blob/HEAD:/inc/AWLUtilities.php

You need both of the last two functions - call the first one during
initialisation, and use the second one to clean the strings.

Cheers,
					Andrew McMillan.


------------------------------------------------------------------------
andrew (AT) morphoss (DOT) com                            +64(272)DEBIAN
              You will feel hungry again in another hour.
------------------------------------------------------------------------

In response to

Responses

pgsql-php by date

Next:From: Sylvain RacineDate: 2009-12-08 22:47:08
Subject: Re: Problem with utf8 encoding
Previous:From: Jorge Miranda CastañedaDate: 2009-12-03 07:00:16
Subject: Problem with utf8 encoding

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group