Re: finding bogus UTF-8

From: Marko Kreen <markokr(at)gmail(dot)com>
To: Scott Ribe <scott_ribe(at)elevated-dev(dot)com>
Cc: PostgreSQL general <pgsql-general(at)postgresql(dot)org>
Subject: Re: finding bogus UTF-8
Date: 2011-02-15 21:21:16
Message-ID: AANLkTi==bd28k_J2=Dg0kcLD_mMTrLByCGrS+PHk1U-s@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Thu, Feb 10, 2011 at 9:02 PM, Scott Ribe <scott_ribe(at)elevated-dev(dot)com> wrote:
> I know that I have at least one instance of a varchar that is not valid UTF-8, imported from a source with errors (AMA CPT files, actually) before PG's checking was as stringent as it is today. Can anybody suggest a query to find such values?

CREATE OR REPLACE FUNCTION is_utf8(text)
RETURNS bool AS $$
try:
args[0].decode('utf8')
return True
except UnicodeDecodeError:
return False
$$ LANGUAGE plpythonu STRICT;

--
marko

In response to

Browse pgsql-general by date

  From Date Subject
Next Message David Kerr 2011-02-15 21:33:37 Re: pg_dump: schema with OID 58698 does not exist
Previous Message Vick Khera 2011-02-15 21:20:40 Re: finding bogus UTF-8