Re: Problem with utf8 encoding

From: Andrew McMillan <andrew(at)morphoss(dot)com>
To: Jorge Miranda Castañeda <jmirandac(dot)85(at)gmail(dot)com>
Cc: pgsql-php(at)postgresql(dot)org
Subject: Re: Problem with utf8 encoding
Date: 2009-12-03 09:35:14
Message-ID: 1259832914.8024.823.camel@happy.home.mcmillan.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-php

On Thu, 2009-12-03 at 02:00 -0500, Jorge Miranda Castañeda wrote:
> Hello everyone!
>
>
> I'm working in a project using postgres, propel, and php.
>
>
> My development environment is:
> SO: Windows vista Business SP2
> Postgres: Postgres v8.4
> Propel: Propel generator/runtime v1.4
> PHP: PHP v5.3
>
>
> Currently I'm struggling with a problem caused by the encoding.
> Everytime I try to insert a row into the table CURRENCY, which has ID,
> DESC, and SYMBOL as its columns, I get the following error:
> Unable to execute INSERT statement. [wrapped: SQLSTATE[22021]:
> Character not in repertoire: 7 ERROR: invalid byte sequence for
> encoding "UTF8": 0x80 HINT: This error can also happen if the byte
> sequence does not match the encoding expected by the server, which is
> controlled by "client_encoding".]
>
>
> I've created the database using this sentence:
> CREATE DATABASE sbs
> WITH OWNER = sbsadmin
> ENCODING = 'UTF8'
> LC_COLLATE = 'Spanish_Peru.1252'
> LC_CTYPE = 'Spanish_Peru.1252'
> CONNECTION LIMIT = -1;

Hola Jorge,

I suspect it's the LC_COLLATE and LC_CTYPE that you have there. I don't
*know* this, but they *look* like they are some weird sort of
collation/ctype based on the misguided Windows-1252 encoding. Sadly,
Windows provides data in this encoding into web forms where the accept
charset is supposedly only ISO-8859.

In Windows-1252 the Euro currency symbol is somewhere in the 0x80 - 0x9f
range - possibly it is 0x80, in fact.

I think you would be better to use a consistent locale like es_PE.UTF-8
though if your data is 1252 encoded then you might need to iconv it
first.

If you have data which is a mix of ISO-8859-1, Windows-1252 and UTF-8
then I can point you at a wee bit of PHP code I wrote which will look at
each character in a string and only iconv from 8859/1252 -> UTF-8 if it
is a high-bit byte which is not part of a valid UTF-8 character already.

The code is here:

http://repo.or.cz/w/awl.git/blob/HEAD:/inc/AWLUtilities.php

You need both of the last two functions - call the first one during
initialisation, and use the second one to clean the strings.

Cheers,
Andrew McMillan.

------------------------------------------------------------------------
andrew (AT) morphoss (DOT) com +64(272)DEBIAN
You will feel hungry again in another hour.
------------------------------------------------------------------------

In response to

Responses

Browse pgsql-php by date

  From Date Subject
Next Message Sylvain Racine 2009-12-08 22:47:08 Re: Problem with utf8 encoding
Previous Message Jorge Miranda Castañeda 2009-12-03 07:00:16 Problem with utf8 encoding