Re: Unicode problem again

From: "Albe Laurenz" <laurenz(dot)albe(at)wien(dot)gv(dot)at>
To: "Michael Fuhr *EXTERN*" <mike(at)fuhr(dot)org>
Cc: "Garry Saddington *EXTERN*" <garry(at)schoolteachers(dot)co(dot)uk>, "pgsql-general General" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Unicode problem again
Date: 2008-06-26 13:31:01
Message-ID: D960CB61B694CF459DCFB4B0128514C2023F9895@exadv11.host.magwien.gv.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Michael Fuhr wrote:
> > > ProgrammingError Error Value: ERROR: character 0xe28099 of
> > > encoding "UTF8" has no equivalent in "LATIN1" select distinct
> > [...]
> >
> > This is UNICODE 0x2019, a "right single quotation mark".
> >
> > This is a "Windows character" - the only non-UNICODE codepages I
> > know that contain this character are the Microsoft codepages.
> [...]
> >
> > > I have changed client_encoding to Latin1 to get over errors
> > > caused by having the database in UTF8 and users trying to
> > > enter special characters like £ signs.
> > >
> > > Unfortunately, it seems there are already UTF8 encodings in
> > > the DB that have no equivalent in Latin1 from before the change.
>
> Your input data seems to have a mix of encodings: sometimes you're
> getting pound signs in a non-UTF-8 encoding, but if characters like
> <U+2019 RIGHT SINGLE QUOTATION MARK> got into the database when
> client_encoding was set to UTF8 then at least some data must have
> been in UTF-8.

Sorry, but that's not true.
That character is 0x9s in WINDOWS-1252.

So it could have been that client_encoding was (correctly) set to WIN1252
and the quotation mark was entered as a single byte character.

Yours,
Laurenz Albe

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Michael Fuhr 2008-06-26 14:41:07 Re: Unicode problem again
Previous Message Lincoln Yeoh 2008-06-26 13:26:28 Re: Probably been asked a hundred times before.