Re: encoding confusion

From: "Albe Laurenz" <laurenz(dot)albe(at)wien(dot)gv(dot)at>
To: "Sim Zacks *EXTERN*" <sim(at)compulab(dot)co(dot)il>, <pgsql-general(at)postgresql(dot)org>
Subject: Re: encoding confusion
Date: 2008-06-11 05:58:16
Message-ID: D960CB61B694CF459DCFB4B0128514C20230A1CE@exadv11.host.magwien.gv.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Sim Zacks wrote:
> We originally tested it on mysql and now we are migrating it
> to postgresql.
>
> The messages are stored in a longblob field on mysql and a bytea field
> in postgresql.
>
> I set the database up as UTF-8, even though we get emails that are not
> UTF encoded, mostly because I didn't know what else to try that would
> incorporate all the possible encodings. Examples of 3 encodings we
> regularly receive are: UTF-8, Windows-1255, ISO-8859-8-I.

[...]

> It would not transfer through the dbi-link, so I wrote a python script
> (see below) to read a row from mysql and write a row to postgresql
> (using pygresql and mysqldb).
> When I used pygresql's escape_bytea function to copy the data, it went
> smoothly, but the data was corrupt.
> When I tried the escape_string function it died because the data it was
> moving was not UTF-8.
>
> I finally got it to work by defining a database as SQL-ASCII and then
> using escape_string worked. After the data was all in place, I pg_dumped
> and pg_restored into a UTF-8 database and it surprisingly works now.

It's very dificult to know what exactly happened unless you have some
examples of a byte sequence that illustrates what you describe:
How it looked in MySQL, how it looked in your Python script, what you
fed to escape_bytea.

What client encoding did you use in your Python script?

Yours,
Laurenz Albe

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Richard Huxton 2008-06-11 07:03:34 Re: encoding confusion
Previous Message Sim Zacks 2008-06-11 05:35:59 Re: encoding confusion