Re: Importing data - possible UTF8 import bug?

From: "Mikel Lindsaar" <raasdnil(at)gmail(dot)com>
To: pgsql-admin(at)postgresql(dot)org
Subject: Re: Importing data - possible UTF8 import bug?
Date: 2008-07-11 08:37:39
Message-ID: 57a815bf0807110137j198cb9e9gcb7d07405c42f42e@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

OK, I'm mailing the list the results of my problem so future people can find it.

The error was

ERROR: invalid byte sequence for encoding "UTF8": 0xa2

with many different types of 0x... lines.

The problem was indeed a bug, but one that sat between the keyboard
and screen (that is, me), not with the COPY command. I didn't read
the COPY docs well enough, in there it clearly states that a backslash
followed by digits will be interpreted as a character with that
numeric code (in the table).

As the data I was importing contained addresses, it had a unit number
and street number, like this; 2\554, so this was being interpreted as
the number 2 followed by a character represented by \554 which was an
invalid sequence and so rightly so, Copy failed and complained about
an invalid char sequence.

Going through the data set and replacing the backslashes with forward
slashes (which works in my case) or if you need to be non destructive,
replcaing the single backslash with a double backslash, handles the
problem.

Sorry all for the noise.

Mikel

--
http://lindsaar.net/
Rails, RSpec and Life blog....

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message niall el-assaad 2008-07-11 09:39:43 Password recommendations for an appliance
Previous Message neo3 matrix 2008-07-11 07:58:45 Database backup problem.........