Re: [JDBC] Invalid Character Data Problem

From: Kris Jurka <books(at)ejurka(dot)com>
To: Hunter Hillegas <lists(at)lastonepicked(dot)com>
Cc: PostgreSQL <pgsql-general(at)postgresql(dot)org>, Postgre JDBC List <pgsql-jdbc(at)postgresql(dot)org>
Subject: Re: [JDBC] Invalid Character Data Problem
Date: 2004-11-30 06:34:46
Message-ID: Pine.BSO.4.56.0411300128400.10837@leary.csoft.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-jdbc

On Fri, 26 Nov 2004, Hunter Hillegas wrote:

> When I SELECT from a certain table, I see this JDBC exception:
>
> "Invalid character data was found. This is most likely caused by stored
> data containing characters that are invalid for the character set the
> database was created in. The most common example of this is storing 8bit
> data in a SQL_ASCII database."
>
> The database is indeed of type SQL_ASCII. The table stores mailing list data
> and has about 400,000 rows.
>
> Looking at the data via psql, I see that some of the rows have strange
> characters in them, such as question marks where I would not expect them,
> etc...
>
> What are my options? Is there a way to identify the 'bad' records, or the
> ones causing trouble?
>

To really solve this problem you need to have a correctly encoded
database. This will involve a dump and restore process and possibly
recoding your data. This is straightforward if you know what
encoding your data is, although it will cause some downtime.

To detect the bad data you can try various SELECTs with the JDBC driver
and see what errors out. The function below will determine if a
particular field has data with the high bit set which is something the
database really doesn't know what to do with.

SELECT pkcolumn, hashighbit(columna), hashighbit(columnb) FROM mytable;

Kris Jurka

CREATE OR REPLACE FUNCTION hashighbit(text) RETURNS boolean AS '
DECLARE
i int;
BEGIN
i := LENGTH($1);
WHILE i > 0 LOOP
IF ascii(substring($1, i, 1)) >= 128 THEN
RETURN true;
END IF;
i := i-1;
END LOOP;
RETURN false;
END;
' LANGUAGE 'plpgsql';

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Hunter Hillegas 2004-11-30 06:49:23 Re: [JDBC] Invalid Character Data Problem
Previous Message Michael Fuhr 2004-11-30 06:13:10 Re: increase the column size

Browse pgsql-jdbc by date

  From Date Subject
Next Message Hunter Hillegas 2004-11-30 06:49:23 Re: [JDBC] Invalid Character Data Problem
Previous Message Kris Jurka 2004-11-30 06:18:05 Re: Translation update: pt_BR