BUG #17501: COPY is failing with "ERROR: invalid byte sequence for encoding "UTF8": 0xe5"

From: PG Bug reporting form <noreply(at)postgresql(dot)org>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: wizard_1024(at)tut(dot)by
Subject: BUG #17501: COPY is failing with "ERROR: invalid byte sequence for encoding "UTF8": 0xe5"
Date: 2022-05-28 20:52:19
Message-ID: 17501-128b1dd039362ae6@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 17501
Logged by: Vitaly Voronov
Email address: wizard_1024(at)tut(dot)by
PostgreSQL version: 14.3
Operating system: CentOS Linux release 7.9.2009 (Core)
Description:

Hello,

We've seen a such bug: COPY command shows error "ERROR: invalid byte
sequence for encoding "UTF8": 0xe5" on file.
The same file with small amount of lines is imported without any errors.

How to reproduce bug:
# create database
# create database with
# SQL_ASCII, C, C
createdb --encoding=SQL_ASCII --lc-collate=C --lc-ctype=C
--template=template0 test

# connect to the database
psql test

# Create table
CREATE TABLE test_data (
test_data text
);

# Import without error
truncate table test_data;
COPY test_data (test_data) FROM '/tmp/test_pass.csv' WITH DELIMITER AS ','
CSV QUOTE AS '"';

COPY 207

# Import with error
truncate table test_data;
COPY test_data (test_data) FROM '/tmp/test_fail.csv' WITH DELIMITER AS ','
CSV QUOTE AS '"';

ERROR: invalid byte sequence for encoding "UTF8": 0xe5
CONTEXT: COPY test_data, line 627

# both files contains the same rows, but test_fail contains more rows
# seems that the file more than 65K size cannot be imported
# if create DB with UTF8 encoding instead of SQL_ASCII - both tests will be
passed

# How to generate files:
# Imported without errors
for i in $(seq 1 207); do echo
"NURO光です。明日の宅内工事お立合いよろしくお願い致します。2回目の屋外工事につきましては具体的な工事日案内の準備が整い次第、こちらからご連絡いたします。※詳細はこちら【工事について】https://www.test.jp/1234/5678.html&id=12211"
>> /tmp/test_pass.csv; done;
# Imported with errors
for i in $(seq 1 5722); do echo
"NURO光です。明日の宅内工事お立合いよろしくお願い致します。2回目の屋外工事につきましては具体的な工事日案内の準備が整い次第、こちらからご連絡いたします。※詳細はこちら【工事について】https://www.test.jp/1234/5678.html&id=12211"
>> /tmp/test_fail.csv; done;

# Both files can be imported without any problem to PostgreSQL 11.

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Vitaly V. Voronov 2022-05-28 20:57:25 Re: BUG #17501: COPY is failing with "ERROR: invalid byte sequence for encoding "UTF8": 0xe5"
Previous Message Andres Freund 2022-05-28 19:34:13 Re: BUG #17485: Records missing from Primary Key index when doing REINDEX INDEX CONCURRENTLY