From: | dennis jenkins <dennis(dot)jenkins(dot)75(at)gmail(dot)com> |
---|---|
To: | Scott Ribe <scott_ribe(at)elevated-dev(dot)com> |
Cc: | PostgreSQL general <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: finding bogus UTF-8 |
Date: | 2011-02-10 22:18:03 |
Message-ID: | AANLkTikW0Yb8ZyULMVLgdLsv-A+6O_TF3rHmtdxX8mUG@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Thu, Feb 10, 2011 at 1:02 PM, Scott Ribe <scott_ribe(at)elevated-dev(dot)com> wrote:
> I know that I have at least one instance of a varchar that is not valid UTF-8, imported from a source with errors (AMA CPT files, actually) before PG's checking was as stringent as it is today. Can anybody suggest a query to find such values?
If you know which table and column the data is in, you can also do
something like this:
(I typed this up without checking the syntax of it. The basic idea is
to cast the column as bytea, encode with the 'escape' method, then
grep for back-slashes).
select * from bad_table where regexp_match (encode (bad_column::bytea,
'escape'), '\\\\'));
From | Date | Subject | |
---|---|---|---|
Next Message | Andy Colson | 2011-02-10 22:18:15 | Re: Schema version control |
Previous Message | dennis jenkins | 2011-02-10 22:14:12 | Re: finding bogus UTF-8 |