Re: Postgresql 9.4.4 - ERROR: invalid byte sequence for encoding "UTF8": 0x92

From: Prasanth Reddy <dbadmin(at)nqadmin(dot)com>
To: pgsql-admin(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Postgresql 9.4.4 - ERROR: invalid byte sequence for encoding "UTF8": 0x92
Date: 2015-08-11 16:31:54
Message-ID: 55CA237A.4080300@nqadmin.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Thanks for the prompt response. I was playing with it a bit more and seems like any character with value less than 65533 is working fine, guessing that is all Unicode characters. Does the server also
reject an insert/update when there are invalid characters? I took a character that is supposed to be invalid (displayed as a small box, from application using 9.1 version) and pasted it in to
application using 9.4 version of postgresql and I was able to save it to database. Should this have failed?

If I find and fix all these characters (which would be a huge task), I want to make sure that the database is not going to take any new invalid characters. Please let me know if there is some setting
I can change in the configuration to do this. Another option I was thinking of is may be change the encoding of the database itself to UTF8. Before the pg_restore used to fail when I tried the
database encoding of UTF8 may be if I fix the invalid characters and then do a dump it would work.

Thanks,
Prasanth

Prasanth Reddy <dbadmin(at)nqadmin(dot)com> writes:
> I am currently running 9.1.9 and trying to upgrade to 9.4. I have done a dump and restore, when I start my java application I am getting the below error. The server uses SQL_ASCII encoding and the
> client encoding is UTF8. There are some invalid characters in the database but this has not caused a problem in the current version or 9.3 (tried a restore in 9.3 and the application works fine).

> ERROR: invalid byte sequence for encoding "UTF8": 0x92
> STATEMENT: SELECT * FROM client_data WHERE status_code = 0 ORDER BY client_name, description

You need to fix the encoding errors in your data. 9.4 is intentionally
less lax about that than prior versions.

Or, if you really want the database to be totally encoding-ignorant,
use SQL_ASCII as both client and server encoding. But if you have the
client declared to use UTF8, the server will try not to send anything
that isn't valid UTF8.

I believe the specific change that's biting you is

Author: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Branch: master Release: REL9_4_BR [49c817eab] 2014-02-23 15:22:50 -0500

Plug some more holes in encoding conversion.

Various places assume that pg_do_encoding_conversion() and
pg_server_to_any() will ensure encoding validity of their results;
but they failed to do so in the case that the source encoding is SQL_ASCII
while the destination is not. We cannot perform any actual "conversion"
in that scenario, but we should still validate the string according to the
destination encoding. Per bug #9210 from Digoal Zhou.

but there were some others of the same ilk in 9.4.

regards, tom lane
Prasanth Reddy <dbadmin(at)nqadmin(dot)com> writes:
> I am currently running 9.1.9 and trying to upgrade to 9.4. I have done a dump and restore, when I start my java application I am getting the below error. The server uses SQL_ASCII encoding and the
> client encoding is UTF8. There are some invalid characters in the database but this has not caused a problem in the current version or 9.3 (tried a restore in 9.3 and the application works fine).

> ERROR: invalid byte sequence for encoding "UTF8": 0x92
> STATEMENT: SELECT * FROM client_data WHERE status_code = 0 ORDER BY client_name, description

You need to fix the encoding errors in your data. 9.4 is intentionally
less lax about that than prior versions.

Or, if you really want the database to be totally encoding-ignorant,
use SQL_ASCII as both client and server encoding. But if you have the
client declared to use UTF8, the server will try not to send anything
that isn't valid UTF8.

I believe the specific change that's biting you is

Author: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Branch: master Release: REL9_4_BR [49c817eab] 2014-02-23 15:22:50 -0500

Plug some more holes in encoding conversion.

Various places assume that pg_do_encoding_conversion() and
pg_server_to_any() will ensure encoding validity of their results;
but they failed to do so in the case that the source encoding is SQL_ASCII
while the destination is not. We cannot perform any actual "conversion"
in that scenario, but we should still validate the string according to the
destination encoding. Per bug #9210 from Digoal Zhou.

but there were some others of the same ilk in 9.4.

regards, tom lane

On 08/11/2015 09:59 AM, Prasanth Reddy wrote:
> Hi,
>
> I have posted a question about this same issue on JDBC thinking it is a driver issue. I was told this error is generated by the back-end itself rather than by the driver so posting the question in
> admin forum. See discussion on this here http://www.postgresql.org/list/pgsql-jdbc/since/201508080000/
>
> I am currently running 9.1.9 and trying to upgrade to 9.4. I have done a dump and restore, when I start my java application I am getting the below error. The server uses SQL_ASCII encoding and the
> client encoding is UTF8. There are some invalid characters in the database but this has not caused a problem in the current version or 9.3 (tried a restore in 9.3 and the application works fine).
>
> ERROR: invalid byte sequence for encoding "UTF8": 0x92
> STATEMENT: SELECT * FROM client_data WHERE status_code = 0 ORDER BY client_name, description
>
>
> org.postgresql.util.PSQLException: ERROR: invalid byte sequence for encoding "UTF8": 0x92
>> at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2270)
>> at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1998)
>> at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:255)
>> at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:570)
>> at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:420)
>> at org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:305)
>> at com.sun.rowset.JdbcRowSetImpl.execute(JdbcRowSetImpl.java:567)
>
> Same error with postgresql-9.4-1201.jdbc4.jar & postgresql-9.1-902.jdbc4.jar.
>
> Appreciate your help.
>
> Thanks,
> Prasanth
>

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message Campbell, Lance 2015-08-13 15:10:21 enhancement request for pg_restore
Previous Message John Scalia 2015-08-11 16:26:54 Re: Postgresql-9.1 CentOS7 effective_cache_size issue