Re: BUG #17501: COPY is failing with "ERROR: invalid byte sequence for encoding "UTF8": 0xe5"

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: "Vitaly V(dot) Voronov" <wizard_1024(at)tut(dot)by>
Cc: "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #17501: COPY is failing with "ERROR: invalid byte sequence for encoding "UTF8": 0xe5"
Date: 2022-05-29 09:19:10
Message-ID: 093b07ad-280c-4741-4519-f8db72420ffb@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Sounds a lot like a bug in commit
f82de5c46bdf8cd65812a7b04c9509c218e1545d. Thanks for the report, I'll
investigate!

- Heikki

On 28/05/2022 23:57, Vitaly V. Voronov wrote:
> Hello,
> Right commands:
> # Imported without errors
> for i in $(seq 1 207); do echo
> "NURO光です。明日の宅内工事お立合いよろしくお願い致します。2回目の屋外工
> 事につきましては具体的な工事日案内の準備が整い次第、こちらからご連絡いた
> します。※詳細はこちら【工事について】https://www.test.jp/1234
> /5678.html&id=12211 <https://www.test.jp/1234/5678.html&id=12211>" >>
> /tmp/test_pass.csv; done;
> # Imported with errors
> for i in $(seq 1 5722); do echo "NURO光です。明日の宅内工事お立合いよろ
> しくお願い致します。2回目の屋外工事につきましては具体的な工事日案内の準
> 備が整い次第、こちらからご連絡いたします。※詳細はこちら【工事について】
> https://www.test.jp/1234/5678.html&id=12211" >> /tmp/test_fail.csv; done;
>
>
> 28.05.2022, 23:53, "PG Bug reporting form" <noreply(at)postgresql(dot)org>:
>
> The following bug has been logged on the website:
>
> Bug reference: 17501
> Logged by: Vitaly Voronov
> Email address: wizard_1024(at)tut(dot)by <mailto:wizard_1024(at)tut(dot)by>
> PostgreSQL version: 14.3
> Operating system: CentOS Linux release 7.9.2009 (Core)
> Description:
>
> Hello,
>
> We've seen a such bug: COPY command shows error "ERROR: invalid byte
> sequence for encoding "UTF8": 0xe5" on file.
> The same file with small amount of lines is imported without any errors.
>
> How to reproduce bug:
> # create database
> # create database with
> # SQL_ASCII, C, C
> createdb --encoding=SQL_ASCII --lc-collate=C --lc-ctype=C
> --template=template0 test
>
> # connect to the database
> psql test
>
> # Create table
> CREATE TABLE test_data (
>     test_data text
> );
>
> # Import without error
> truncate table test_data;
> COPY test_data (test_data) FROM '/tmp/test_pass.csv' WITH DELIMITER
> AS ','
> CSV QUOTE AS '"';
>
> COPY 207
>
> # Import with error
> truncate table test_data;
> COPY test_data (test_data) FROM '/tmp/test_fail.csv' WITH DELIMITER
> AS ','
> CSV QUOTE AS '"';
>
> ERROR: invalid byte sequence for encoding "UTF8": 0xe5
> CONTEXT: COPY test_data, line 627
>
> # both files contains the same rows, but test_fail contains more rows
> # seems that the file more than 65K size cannot be imported
> # if create DB with UTF8 encoding instead of SQL_ASCII - both tests
> will be
> passed
>
> # How to generate files:
> # Imported without errors
> for i in $(seq 1 207); do echo
> "NURO光です。明日の宅内工事お立合いよろしくお願い致します。2回目の屋
> 外工事につきましては具体的な工事日案内の準備が整い次第、こちらからご
> 連絡いたします。※詳細はこちら【工事について】https://www.test.jp
> /1234/5678.html&id=12211 <https://www.test.jp/1234/5678.html&id=12211>"
>
>  /tmp/test_pass.csv; done;
>
> # Imported with errors
> for i in $(seq 1 5722); do echo
> "NURO光です。明日の宅内工事お立合いよろしくお願い致します。2回目の屋
> 外工事につきましては具体的な工事日案内の準備が整い次第、こちらからご
> 連絡いたします。※詳細はこちら【工事について】https://www.test.jp
> /1234/5678.html&id=12211 <https://www.test.jp/1234/5678.html&id=12211>"
>
>  /tmp/test_fail.csv; done;
>
>
> # Both files can be imported without any problem to PostgreSQL 11.
>

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Heikki Linnakangas 2022-05-29 10:39:52 Re: BUG #17501: COPY is failing with "ERROR: invalid byte sequence for encoding "UTF8": 0xe5"
Previous Message Noah Misch 2022-05-29 04:34:03 Re: Extension pg_trgm, permissions and pg_dump order