Re: [SPAM]-D] How to find broken UTF-8 characters ?

From: Jasen Betts <jasen(at)xnet(dot)co(dot)nz>
To: pgsql-sql(at)postgresql(dot)org
Subject: Re: [SPAM]-D] How to find broken UTF-8 characters ?
Date: 2010-04-30 10:21:03
Message-ID: hreauf$cma$1@reversiblemaps.ath.cx
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

On 2010-04-29, Andreas <maps(dot)on(at)gmx(dot)net> wrote:
> Hi,
>
> while writing the reply below I found it sounds like beeing OT but it's
> actually not.
> I just need a way to check if a collumn contains values that CAN NOT be
> converted from Utf8 to Latin1.
> I tried:
> Select convert_to (my_column::text, 'LATIN1') from my_table;
>
> It raises an error that says translated:
> ERROR: character 0xe28093 in encoding »UTF8« has no equivalent in »LATIN1«

use a regular expression.

ISO8859-1 is easy, all the caracters a grouped together in unicode so
the regular expression consists of a single inverted range class

SELECT pkey FROM tabname
WHERE ( textfield || textfiled2 || textfield3 ) ~ ('[^'||chr(1)||'-'||chr(255)||']');

In response to

Browse pgsql-sql by date

  From Date Subject
Next Message Jasen Betts 2010-04-30 10:34:40 Re: [SPAM]-D] How to find broken UTF-8 characters ?
Previous Message DM 2010-04-30 00:06:45 Re: problem converting strings to timestamps with time zone