From: | Andrew McMillan <andrew(at)morphoss(dot)com> |
---|---|
To: | Jorge Miranda Castañeda <jmirandac(dot)85(at)gmail(dot)com> |
Cc: | pgsql-php(at)postgresql(dot)org |
Subject: | Re: Problem with utf8 encoding |
Date: | 2009-12-03 09:35:14 |
Message-ID: | 1259832914.8024.823.camel@happy.home.mcmillan.net.nz |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-php |
On Thu, 2009-12-03 at 02:00 -0500, Jorge Miranda Castañeda wrote:
> Hello everyone!
>
>
> I'm working in a project using postgres, propel, and php.
>
>
> My development environment is:
> SO: Windows vista Business SP2
> Postgres: Postgres v8.4
> Propel: Propel generator/runtime v1.4
> PHP: PHP v5.3
>
>
> Currently I'm struggling with a problem caused by the encoding.
> Everytime I try to insert a row into the table CURRENCY, which has ID,
> DESC, and SYMBOL as its columns, I get the following error:
> Unable to execute INSERT statement. [wrapped: SQLSTATE[22021]:
> Character not in repertoire: 7 ERROR: invalid byte sequence for
> encoding "UTF8": 0x80 HINT: This error can also happen if the byte
> sequence does not match the encoding expected by the server, which is
> controlled by "client_encoding".]
>
>
> I've created the database using this sentence:
> CREATE DATABASE sbs
> WITH OWNER = sbsadmin
> ENCODING = 'UTF8'
> LC_COLLATE = 'Spanish_Peru.1252'
> LC_CTYPE = 'Spanish_Peru.1252'
> CONNECTION LIMIT = -1;
Hola Jorge,
I suspect it's the LC_COLLATE and LC_CTYPE that you have there. I don't
*know* this, but they *look* like they are some weird sort of
collation/ctype based on the misguided Windows-1252 encoding. Sadly,
Windows provides data in this encoding into web forms where the accept
charset is supposedly only ISO-8859.
In Windows-1252 the Euro currency symbol is somewhere in the 0x80 - 0x9f
range - possibly it is 0x80, in fact.
I think you would be better to use a consistent locale like es_PE.UTF-8
though if your data is 1252 encoded then you might need to iconv it
first.
If you have data which is a mix of ISO-8859-1, Windows-1252 and UTF-8
then I can point you at a wee bit of PHP code I wrote which will look at
each character in a string and only iconv from 8859/1252 -> UTF-8 if it
is a high-bit byte which is not part of a valid UTF-8 character already.
The code is here:
http://repo.or.cz/w/awl.git/blob/HEAD:/inc/AWLUtilities.php
You need both of the last two functions - call the first one during
initialisation, and use the second one to clean the strings.
Cheers,
Andrew McMillan.
------------------------------------------------------------------------
andrew (AT) morphoss (DOT) com +64(272)DEBIAN
You will feel hungry again in another hour.
------------------------------------------------------------------------
From | Date | Subject | |
---|---|---|---|
Next Message | Sylvain Racine | 2009-12-08 22:47:08 | Re: Problem with utf8 encoding |
Previous Message | Jorge Miranda Castañeda | 2009-12-03 07:00:16 | Problem with utf8 encoding |