Re: Re: Fwd: Re: Re: Postgresql 9.4.4 - ERROR: invalid byte sequence for encoding "UTF8": 0x92

From: Dave Cramer <pg(at)fastcrypt(dot)com>
To: dbadmin(at)nqadmin(dot)com
Cc: List <pgsql-jdbc(at)postgresql(dot)org>, John R Pierce <pierce(at)hogranch(dot)com>
Subject: Re: Re: Fwd: Re: Re: Postgresql 9.4.4 - ERROR: invalid byte sequence for encoding "UTF8": 0x92
Date: 2015-08-10 12:51:18
Message-ID: CADK3HHLJi0vCKPwvS2AhNvm0rBoT242RuBegQ2syEANYbtdrvA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-jdbc

Well your data is still probably wrong. You should fix it

Dave Cramer

dave.cramer(at)credativ(dot)ca
http://www.credativ.ca

On 9 August 2015 at 14:44, Prasanth Reddy <dbadmin(at)nqadmin(dot)com> wrote:

> I am able to restore the same dump in 9.3 and it works fine. So seems like
> 9.4 is much more stringent than all prior versions.
>
> Thanks,
> Prasanth
>
>
> On 08/08/2015 05:34 PM, Prasanth Reddy wrote:
>
> 1. If postgresql always does this I am not sure how it is working in 9.1.
>
> 2. I am able to view the content in pgadmin, why would the server not
> complain when viewing from pgadmin. Not sure what the client encoding would
> be when using pgadmin.
>
> 3. Thought SQL_ASCII means the text is stored as ASCII format, may be I am
> way off base on this. Is there a way to check the notes column for invalid
> UTF8 characters and remove them?
>
> 4. Based on the error there is invalid data in database so the database
> should have allowed this invalid data to come in some how right?
>
> Any suggestions as to how to get this working in 9.4? Right now the
> database is in 9.1 and working fine so may be there is a way to fix these
> issues before doing a dump for 9.4?
>
> Thanks,
> Prasanth
>
> On 8/8/2015 2:40 PM, Prasanth Reddy wrote:
> > 1. Is the content vetting new in 9.4?
>
> no, PostgreSQL has always been strict about content types.
>
> > 2. Is there an option to ask the driver not to vet the content? Some
> times when the users copy notes from a word document there will be some
> characters invalid in UTF8. Most of these should only be in
> > notes columns.
>
> the driver isn't doing this, its happening in the postgres
> database. you should fix invalid input at the source application
> before sending to the database.
>
> > 3. Is there a way to check the content and fix it/remove invalid
> characters?
>
> define 'invalid character', then fix it. this SQL_ASCII field that
> contains 0x92, what character code is that supposed to represent ?
>
> > 4. Is there an option to force the driver to not update database with
> invalid characters?
>
> the database (NOT the driver) does that already. if you try and insert
> invalid data, you get an error and have to rollback the transaction.
>
>
> --
> john r pierce, recycling bits in santa cruz
>
> On 08/08/2015 04:40 PM, Prasanth Reddy wrote:
>
>
>
>
> -------- Forwarded Message -------- Subject: Re: Re: Postgresql 9.4.4 -
> ERROR: invalid byte sequence for encoding "UTF8": 0x92 Date: Sat, 08 Aug
> 2015 16:39:17 -0500 From: Prasanth Reddy <dbadmin(at)nqadmin(dot)com>
> <dbadmin(at)nqadmin(dot)com> Reply-To: dbadmin(at)nqadmin(dot)com To: Bosco Rama
> <postgres(at)boscorama(dot)com> <postgres(at)boscorama(dot)com>
>
> 1. Is the content vetting new in 9.4?
> 2. Is there an option to ask the driver not to vet the content? Some times when the users copy notes from a word document there will be some characters invalid in UTF8. Most of these should only be in
> notes columns.
> 3. Is there a way to check the content and fix it/remove invalid characters?
> 4. Is there an option to force the driver to not update database with invalid characters?
>
> Appreciate your help.
>
> Thanks,
> Prasanth
>
> ------------------Original Message--------------------------
>
> On 08/08/15 10:49, Prasanth Reddy wrote:
> >
> [snip]
> > 12:42:02.505 (1) <=BE ParameterStatus(client_encoding = UTF8)
> [snip]
> > 12:42:02.506 (1) <=BE ParameterStatus(server_encoding = SQL_ASCII)
>
> There's your problem. The client is expecting UTF8 while the server
> is storing un-vetted SQL_ASCII. You will need to change one of the
> components' choice of encoding and/or enforce content conformance in the
> data. Another choice, in cases of known and expected non-compliance, is
> to use bytea casting to bypass the automated content vetting. But your
> choice in this case is going to be driven by the ease with which you can
> alter your application vs the server-side changes that would be needed.
>
> HTH,
> Bosco.
>
>
>
>
>
>
>

In response to

Browse pgsql-jdbc by date

  From Date Subject
Next Message Bosco Rama 2015-08-10 17:14:35 Re: Re: Postgresql 9.4.4 - ERROR: invalid byte sequence for encoding "UTF8": 0x92
Previous Message Gabriel E. Sánchez Martínez 2015-08-10 02:38:15 Re: Bad value for type int