Re: BUG #14197: ERROR: character with byte sequence 0x81 in encoding "WIN1252" has no equivalent in encoding "UTF8"

From: "Albin, Lloyd P" <lalbin(at)scharp(dot)org>
To: John R Pierce <pierce(at)hogranch(dot)com>, "sheri(dot)bhavani(at)cognizant(dot)com" <sheri(dot)bhavani(at)cognizant(dot)com>, "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #14197: ERROR: character with byte sequence 0x81 in encoding "WIN1252" has no equivalent in encoding "UTF8"
Date: 2016-06-17 17:34:41
Message-ID: AE011E7AE62117479360E1E2BD341F4EE58137E3@adama.fhcrc.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

I have sent the sheri.bhavani a presentation I wrote on how to fix this issue after we ran into the same issues.

When I was upgrading our SQL_ASCII from Postgres 9.2 to 9.4, I found that some of our queries would no longer work. This is due to changes in the internal functions so that if they run across multiple encoding, they will throw an error. So this forced us to revisit the project for converting all our data to UTF8. The problem was finding all that data and getting it converted. We had ACSII, UTF8, LATIN1, and WIN1252 encodings within our database and sometime different encodings for different rows of data within the same table.

In this presentation I will show you the functions to write, code supplied, to find all the non-ASCII characters in your database so that you can then manually figure out how to backup each section of data. Then we will cover backing up and restoring each of those sections of data.

If anyone else is interested, drop me a note and I will send you a copy of the presentation.

--
Lloyd Albin
Database Manager / Statistical Center for HIV/AIDS Research and Prevention (SCHARP) / lalbin(at)fredhutch(dot)org / Fred Hutch / Cures Start Here

From: pgsql-bugs-owner(at)postgresql(dot)org [mailto:pgsql-bugs-owner(at)postgresql(dot)org] On Behalf Of John R Pierce
Sent: Friday, June 17, 2016 1:04 AM
To: sheri(dot)bhavani(at)cognizant(dot)com; pgsql-bugs(at)postgresql(dot)org
Subject: Re: [BUGS] BUG #14197: ERROR: character with byte sequence 0x81 in encoding "WIN1252" has no equivalent in encoding "UTF8"

On 6/16/2016 10:08 PM, sheri(dot)bhavani(at)cognizant(dot)com<mailto:sheri(dot)bhavani(at)cognizant(dot)com> wrote:
• ERROR: character with byte sequence 0x81 in encoding "WIN1252" has no equivalent in encoding "UTF8" is thrown in postgreSQL 9.5.3 .

per https://en.wikipedia.org/wiki/Windows-1252 0x81 is not a valid character in encoding win1252, so it can't be converted to UTF8

you need to determine what field of which row of what table has that value in it and change it to something valid (perhaps 0x20 ?) before you can load this data into a UTF8 database.

--

john r pierce, recycling bits in santa cruz

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Michael Paquier 2016-06-17 23:31:18 Re: pg_dump doesn't dump new objects created in schemas from extensions
Previous Message Martín Marqués 2016-06-17 17:21:01 Re: pg_dump doesn't dump new objects created in schemas from extensions