From: | "Albe Laurenz" <laurenz(dot)albe(at)wien(dot)gv(dot)at> |
---|---|
To: | "Tom Lane *EXTERN*" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Janine Sisk" <janine(at)furfly(dot)net> |
Cc: | <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Trouble with UTF-8 data |
Date: | 2008-01-18 08:00:21 |
Message-ID: | D960CB61B694CF459DCFB4B0128514C2CC26AD@exadv11.host.magwien.gv.at |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Tom Lane wrote:
>> But I'm still getting this error when loading the data into the new
>> database:
>
>> ERROR: invalid byte sequence for encoding "UTF8": 0xeda7a1
>
> The reason PG doesn't like this sequence is that it corresponds to
> a Unicode "surrogate pair" code point, which is not supposed to
> ever appear in UTF-8 representation --- surrogate pairs are a kluge for
> UTF-16 to deal with Unicode code points of more than 16 bits.
0xEDA7A1 (UTF-8) corresponds to UNICODE code point 0xD9E1, which,
when interpreted as a high surrogare and followed by a low surrogate,
would correspond to the UTF-16 encoding of a code point
between 0x88400 and 0x887FF (depending on the value of the low surrogate).
These code points do not correspond to any valid character.
So - unless there is a flaw in my reasoning - there's something
fishy with these data anyway.
Janine, could you give us a hex dump of that line from the copy statement?
Yours,
Laurenz Albe
From | Date | Subject | |
---|---|---|---|
Next Message | Mayuresh Nirhali | 2008-01-18 08:55:55 | Re: Online Oracle to Postgresql data migration |
Previous Message | Jean-Michel Pouré | 2008-01-18 07:55:09 | Re: advocacy: drupal and PostgreSQL |