Re: COPY-ing ASCII file into UTF-8 database

From: Emanuel Calvo Franco <postgres(dot)arg(at)gmail(dot)com>
To: Toomas Vendelin <pg(at)vendelin(dot)com>
Cc: pgsql-admin(at)postgresql(dot)org
Subject: Re: COPY-ing ASCII file into UTF-8 database
Date: 2009-04-06 13:56:42
Message-ID: f205bb120904060656j517852f2iec81fa8e3fe0f243@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

2009/4/6 Toomas Vendelin <pg(at)vendelin(dot)com>:
> THE ISSUE:
> I need to import a text file (ASCII) dumped from MySQL 3.23 database into
> Postgre table (UTF-8). The file contains "specifically Estonian characters"
> Õ, Ä, Ö, Ü. My attempt to COPY this file directly failed. Postgre said:
>
> ERROR:  invalid byte sequence for encoding "UTF8": 0xf5706509
> HINT:  This error can also happen if the byte sequence does not match the
> encoding expected by the server, which is controlled by "client_encoding".
>
> OK... The file isn't big, and there are just 4 characters causing problems,
> so I substitued each of them with a distinctive pattern of ASCII characters
> and successfully COPY-ied the file into Postgre table. With PgAdmin I've
> changed 1 record back to Estonian – worked fine. I then exported this table
> into a file, opened the file in UltraEdit (the status bar showed it's a
> UTF-8 indeed) and changed a few temporary substitutions with "specifically
> Estonian characters". Then I cleared the table and tried to COPY data from
> my now UTF-8 file again. I've got an error:
>
> ERROR:  invalid input syntax for integer: "2"
> CONTEXT:  COPY school, line 1, column sch_id: "2"
>
> PgAdmin actually displayed it with something like chopped hyphen, which I've
> failed to copy into this message and which wasn't visible in the text editor
> even with "Show hidden characters".
>
> MY QUESTIONS:
>
> 1. What is the proper solution for importing data from plain ASCII files,
> that contain the type of characters mentioned above?
>
> 2. I'd like to fill the gaps (huge, obviously) in my knowlege about handling
> different charsets. Can you point me to some sane resource(s)? I'm sure,
> there's a lot of them. Sorry for slipping off the topic here.
>
> Regards,
>
> Toomas
>
>

In adition of set the local encoding (Client and Server), if you are
runnning +nix there is a command called iconv that let you convert
from one charset to another. Always i used it, i never had any
problems.

--
Emanuel Calvo Franco
Sumate al ARPUG !
(www.postgres-arg.org -
www.arpug.com.ar)
ArPUG / AOSUG Member
Postgresql Support & Admin

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message J. Carlos Muro 2009-04-07 11:23:57 pg_castalog.pg_tables does not show the real tablespace for tables
Previous Message Dimitri Fontaine 2009-04-06 09:36:32 Re: how to do prefix_range default value?