Re: [GENERAL] postgres & server encodings

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: Joel Fradkin <jfradkin(at)wazagua(dot)com>, "'Salem Berhanu'" <salemb4(at)hotmail(dot)com>, pgsql-admin(at)postgresql(dot)org, pgsql-general(at)postgresql(dot)org
Subject: Re: [GENERAL] postgres & server encodings
Date: 2005-08-09 17:31:03
Message-ID: 17412.1123608663@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin pgsql-general

Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> writes:
> The problem only shows up when you have mixed data -- say, you have two
> applications, one website in PHP which inserts data in Latin-1, and a
> Windows app which inserts in UTF-8. In this case your data will be a
> mess to fix, and there's no way a single conversion will get it right.
> You will have to manually separate the parts that are UTF8 from the
> Latin1, and import them separately. Not a position I'd like to be in.

The only helpful tip I can think of is that you can try to import data
into a UTF8 database and see if it gets rejected as badly encoded; this
will at least give you a weak tool to separate what's what.

I'm afraid the reverse direction won't help much --- in single-byte
encodings such as Latin1 there are no encoding errors, and so you can't
do any simple filtering to check in that direction. In the end you're
going to have to eyeball a lot of data for plausibility :-(

regards, tom lane

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message Michael Fuhr 2005-08-09 18:34:43 Re: PG 7.3.4 VS PG 8.0.3 Problem
Previous Message Chris Hoover 2005-08-09 17:27:55 PG 7.3.4 VS PG 8.0.3 Problem

Browse pgsql-general by date

  From Date Subject
Next Message Greg Stark 2005-08-09 18:38:43 Re: [GENERAL] postgres & server encodings
Previous Message Alvaro Herrera 2005-08-09 17:23:05 Re: Poll on your LAPP Preferences