COPY-ing ASCII file into UTF-8 database

From: Toomas Vendelin <pg(at)vendelin(dot)com>
To: pgsql-admin(at)postgresql(dot)org
Subject: COPY-ing ASCII file into UTF-8 database
Date: 2009-04-06 06:28:01
Message-ID: 2FAE86BD-E366-4F25-9CE0-3C089F761960@vendelin.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

THE ISSUE:
I need to import a text file (ASCII) dumped from MySQL 3.23 database
into Postgre table (UTF-8). The file contains "specifically Estonian
characters" Õ, Ä, Ö, Ü. My attempt to COPY this file directly
failed. Postgre said:

ERROR: invalid byte sequence for encoding "UTF8": 0xf5706509
HINT: This error can also happen if the byte sequence does not match
the encoding expected by the server, which is controlled by
"client_encoding".

OK... The file isn't big, and there are just 4 characters causing
problems, so I substitued each of them with a distinctive pattern of
ASCII characters and successfully COPY-ied the file into Postgre
table. With PgAdmin I've changed 1 record back to Estonian – worked
fine. I then exported this table into a file, opened the file in
UltraEdit (the status bar showed it's a UTF-8 indeed) and changed a
few temporary substitutions with "specifically Estonian characters".
Then I cleared the table and tried to COPY data from my now UTF-8 file
again. I've got an error:

ERROR: invalid input syntax for integer: "2"
CONTEXT: COPY school, line 1, column sch_id: "2"

PgAdmin actually displayed it with something like chopped hyphen,
which I've failed to copy into this message and which wasn't visible
in the text editor even with "Show hidden characters".

MY QUESTIONS:

1. What is the proper solution for importing data from plain ASCII
files, that contain the type of characters mentioned above?

2. I'd like to fill the gaps (huge, obviously) in my knowlege about
handling different charsets. Can you point me to some sane
resource(s)? I'm sure, there's a lot of them. Sorry for slipping off
the topic here.

Regards,

Toomas

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Tommy Gildseth 2009-04-06 06:55:37 Re: COPY-ing ASCII file into UTF-8 database
Previous Message Scott Marlowe 2009-04-05 15:53:46 Re: Making field non case sensitive