Re: How to remove non-UTF values from a table?

From: Howard Cole <howardnews(at)selestial(dot)com>
To: Phoenix Kiula <phoenix(dot)kiula(at)gmail(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org List" <pgsql-general(at)postgresql(dot)org>
Subject: Re: How to remove non-UTF values from a table?
Date: 2009-12-15 13:26:54
Message-ID: 4B278E9E.1080907@selestial.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Phoenix Kiula wrote:
> An easy question for some I hope.
>
> I have a DB from 8.2 days that when I now dump and try to take into
> the 8.3.7, it gives me errors about utf-8 stuff.
>
> I tried searching this list's archives but could not come up with an answer.
>
> Google returns some sites like these:
> http://sniptools.com/databases/finding-non-utf8-values-in-postgresql -
> but I'm not clear on how to use them.
>
> Following the SQL on this site I could identify some columns that
> contain text like this:
>
> "Évolution générale de la situation démographique"
>
> So my guess is that the non-English characters were originally not
> getting written in proper utf-8 variants.
>
> Is there any SQL possibility to find these columns and replace them
> with utf-8 equivalents using some postgresql commands? Couldn't find
> anything in the "Strings functions" (chapter 9 of manual).
>
> We're on CentOS.
>
> Thanks!
>
>
My recommendation would be to install the iconv utility and run it on a
plain text (pg_dump -Fp) backup as suggested in the google article - and
then reimport the clean UTF-8.

I am surprised that you managed to install the original backup on 8.3
because it seems to be much more strict on encoding - Unless your
database is not in UTF-8?

Howard
www.selestial.com

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Josh Kupershmidt 2009-12-15 14:04:06 Re: 8.3 PL/pgSQL comparing arbitrary records
Previous Message Andrew Dunstan 2009-12-15 13:22:14 Re: Fwd: pgAdmin III: timestamp displayed in what time zone?