COPY FROM/TO losing a single byte of a multibyte UTF-8 sequence

From: Steven Schlansker <steven(at)trumpet(dot)io>
To: pgsql-bugs(at)postgresql(dot)org
Subject: COPY FROM/TO losing a single byte of a multibyte UTF-8 sequence
Date: 2010-08-18 23:11:31
Message-ID: 8F72262C-5694-4626-A87F-00604FB5E1D6@trumpet.io
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

Hello fine PostgreSQL bug-busters,

I'm having a rather annoying problem - a particular string is causing the Postgres COPY functionality to lose a byte, causing data corruption in backups and transferred data.

First, the environment -

PostgreSQL 8.4.4 on i386-apple-darwin10.3.0, compiled by GCC i686-apple-darwin10-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5646) (dot 1), 64-bit

Mac OS X 10.6.4

[steven(at)xxx:~]% psql --version
psql (PostgreSQL) 8.4.4
contains support for command-line editing

Now, the setup:
Name | Owner | Encoding | Collation | Ctype | Access privileges | Size | Tablespace | Description
baddb | xxxxxxx_production | UTF8 | en_US.utf-8 | en_US.utf-8 | | 207 MB | pg_default |

baddb=> create table badtable (a int, b int, c character varying, d character varying, e character varying, f character varying[], g text, h character varying[],i character varying[], j character varying[], k character varying[], l character varying[], m character varying[], n character varying[],o character varying, p character varying);

baddb=> \copy badtable from '/tmp/data.copy'
baddb=> \copy badtable to '/tmp/badness.copy'
baddb=> \copy badtable from '/tmp/badness.copy'
ERROR: invalid byte sequence for encoding "UTF8": 0xcf2c
HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".
CONTEXT: COPY badtable, line 1

Obviously, this wouldn't be too helpful without the datafile in question:

1 2377510 FOURSQUARE 1403504 Pizza Hut {} \N {} {} {} {Pizza} {πίτσα,hut,food,ζωγράφου,pizza,eat,zografou} {} \N \N \N

Since this is likely to be eaten by various mail clients or lost in translation, please find attached a TGZ of the data file in question.

Attachment Content-Type Size
data.tgz application/octet-stream 260 bytes

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2010-08-19 01:27:48 Re: BUG #5622: Query failed: server closed the connection unexpectedly
Previous Message Albert Ullrich 2010-08-18 23:07:14 BUG #5626: Parallel pg_restore fails with "tuple concurrently updated"

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Smith 2010-08-18 23:46:08 Re: CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!
Previous Message Kevin Grittner 2010-08-18 21:45:47 CommitFest 2010-07 final report