Re: invalid byte sequence for encoding "UTF8":

From: Bastiaan Olij <lists(at)basenlily(dot)nl>
To: pgsql Novice <pgsql-novice(at)postgresql(dot)org>
Subject: Re: invalid byte sequence for encoding "UTF8":
Date: 2009-01-08 21:32:42
Message-ID: 496670FA.4020608@basenlily.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

Hi Kulmacet,

HTML if not specified otherwise will most likely be ISO-8859-1 (or latin
1) though different browsers may default to other sets. You can check
the header data in the post body in HTML to find out what the character
set is, but you'll need to map it to a character set that Postgres knows
and can convert to utf-8.

Anyways, you could set the client encoding to latin 1 by issueing the
set clientencoding SQL command.

What might be easier and safer is that you change the HTML side of life,
I don't know if PHP in your case also generates the form page? If so you
can simply put a:
header('Content-type: text/html; charset=utf-8');
at the beginning of your code.

Alternatively you can put it in the html file itself:
<html .... >
<head>
...
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
...
</head>
....
</html>

Setting the character set of your form page to utf-8 should result in
the browser returning the entered data as utf-8 aswell. Offcourse you
need to ensure that any data in the html page is also formatted as utf-8
or the browser will misrepresent it (only for characters >128 offcourse) !

If you've saved data in utf-8 into your database, any data from your
database written back into the page will also be utf-8 but you do need
to be careful with any string manipulations you do on these strings as
not all characters in utf-8 are single byte, and asfar as I am aware,
php ignores this fact completely. PHP treats each byte as a single
character, even if its only part of a character.

See this page for more info:

http://www.w3.org/International/O-HTTP-charset

P.S. personally I like setting the HTML page to utf-8 more and just
being a bit careful with what I do with the resulting data in php. In
HTML you can potentially get a mix of character sets that just get you
into trouble in the long term.

Greetz,

Bas

ries van Twisk wrote:

> >
> > On Jan 8, 2009, at 3:08 PM, kulmacet101(at)kulmacet(dot)com wrote:
> >
>
>>> >>>
>>> >>> On Jan 8, 2009, at 2:34 PM, kulmacet101(at)kulmacet(dot)com wrote:
>>> >>>
>>>
>>>> >>>> All,
>>>> >>>>
>>>> >>>> I have a new postgresql<8.3.4> build on linux<CentOS5> with PHP
>>>> >>>> talking to
>>>> >>>> this database. If I try and update or insert on data that has special
>>>> >>>> characters I get this error:
>>>> >>>>
>>>> >>>> ERROR: invalid byte sequence for encoding "UTF8": 0xa9
>>>> >>>> HINT: This error can also happen if the byte sequence does not
>>>> >>>> match the
>>>> >>>> encoding expected by the server, which is controlled by
>>>> >>>> "client_encoding".
>>>> >>>> STATEMENT: UPDATE preferences SET property = $1,preference_value =
>>>> >>>> $2,comment = $3,topic = $4 WHERE app_hash =
>>>> >>>> '50e2606ed950e8021d64349b49f4ee48'
>>>> >>>>
>>>> >>>> I have read some articles about client_encoding but I do not know
>>>> >>>> how to
>>>> >>>> get around this error.
>>>> >>>>
>>>> >>>> Any help or support appreciated.
>>>> >>>> Thanks in advance,
>>>> >>>> Kulmacet
>>>>
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>> Kulmacet,
>>> >>>
>>> >>> is your source data also in UTF-8 ?
>>> >>>
>>> >>> Ries
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>> --
>>> >>> Sent via pgsql-novice mailing list (pgsql-novice(at)postgresql(dot)org)
>>> >>> To make changes to your subscription:
>>> >>> http://www.postgresql.org/mailpref/pgsql-novice
>>> >>>
>>>
>> >>
>> >> I'm not sure how to determine if the source data is UTF-8. This data is
>> >> coming from a post to a form.
>>
> >
> >
> >
> > Your website might not be in UTF-8 in that case.
> >
> > Ries
> >
> >
> >
> >
> >
>

-- Kindest Regards, Bastiaan Olij e-mail/MSN: bastiaan(at)basenlily(dot)nl web:
http://www.basenlily.nl Skype: Mux213
http://www.linkedin.com/in/bastiaanolij

In response to

Browse pgsql-novice by date

  From Date Subject
Next Message Lukas 2009-01-09 09:08:12 Postgesql lib
Previous Message ries van Twisk 2009-01-08 20:16:26 Re: invalid byte sequence for encoding "UTF8":