Quick Links

Re: Unicode problem again

From:	Michael Fuhr <mike(at)fuhr(dot)org>
To:	Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>
Cc:	Garry Saddington EXTERN <garry(at)schoolteachers(dot)co(dot)uk>, pgsql-general General <pgsql-general(at)postgresql(dot)org>
Subject:	Re: Unicode problem again
Date:	2008-06-26 12:36:08
Message-ID:	20080626123607.GA75164@winnie.fuhr.org
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

On Tue, Jun 24, 2008 at 09:16:37AM +0200, Albe Laurenz wrote:
> Garry Saddington wrote:
> > ProgrammingError Error Value: ERROR: character 0xe28099 of
> > encoding "UTF8" has no equivalent in "LATIN1" select distinct
> [...]
>
> This is UNICODE 0x2019, a "right single quotation mark".
>
> This is a "Windows character" - the only non-UNICODE codepages I
> know that contain this character are the Microsoft codepages.
[...]
>
> > I have changed client_encoding to Latin1 to get over errors
> > caused by having the database in UTF8 and users trying to
> > enter special characters like £ signs.
> >
> > Unfortunately, it seems there are already UTF8 encodings in
> > the DB that have no equivalent in Latin1 from before the change.

Your input data seems to have a mix of encodings: sometimes you're
getting pound signs in a non-UTF-8 encoding, but if characters like
<U+2019 RIGHT SINGLE QUOTATION MARK> got into the database when
client_encoding was set to UTF8 then at least some data must have
been in UTF-8. If you're not certain that all data will be in the
same encoding then you might need to attempt to detect the encoding
and set client_encoding accordingly or convert the data to a common
encoding in the application before inserting it (I've had to do
this, sometimes on a line-by-line basis).

Setting client_encoding has implications for display as well as for
input: if the displaying application expects data in one encoding
but you give it data in a different encoding then non-ASCII characters
might not display correctly.

--
Michael Fuhr

In response to

Re: Unicode problem again at 2008-06-24 07:16:37 from Albe Laurenz

Responses

Re: Unicode problem again at 2008-06-26 13:31:01 from Albe Laurenz

Browse pgsql-general by date

	From	Date	Subject
Next Message	A B	2008-06-26 12:43:55	Problem with FOUND
Previous Message	Phillip Mills	2008-06-26 12:34:26	Re: Serialized Access