Re: BUG #17501: COPY is failing with "ERROR: invalid byte sequence for encoding "UTF8": 0xe5"

From: Vitaly V(dot) Voronov <wizard_1024(at)tut(dot)by>
To: "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #17501: COPY is failing with "ERROR: invalid byte sequence for encoding "UTF8": 0xe5"
Date: 2022-05-28 20:57:25
Message-ID: 3459021653771303@mail.yandex.by
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

<div>Hello,</div><div> </div><div>Right commands:</div><div><div><span style="background-color:#ffffff;color:#000000;float:none;font-family:'ys text' , 'arial' , sans-serif;font-size:16px;font-style:normal;font-weight:400;text-decoration-style:initial;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px"># Imported without errors</span><br style="background-color:rgb( 255 , 255 , 255 );color:rgb( 0 , 0 , 0 );font-family:'ys text' , 'arial' , sans-serif;font-size:16px;font-style:normal;font-weight:400;text-decoration-style:initial;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px" /><span style="background-color:#ffffff;color:#000000;float:none;font-family:'ys text' , 'arial' , sans-serif;font-size:16px;font-style:normal;font-weight:400;text-decoration-style:initial;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">for i in $(seq 1 207); do echo</span><br style="background-color:rgb( 255 , 255 , 255 );color:rgb( 0 , 0 , 0 );font-family:'ys text' , 'arial' , sans-serif;font-size:16px;font-style:normal;font-weight:400;text-decoration-style:initial;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px" /><span style="background-color:#ffffff;color:#000000;float:none;font-family:'ys text' , 'arial' , sans-serif;font-size:16px;font-style:normal;font-weight:400;text-decoration-style:initial;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">"NURO光です。明日の宅内工事お立合いよろしくお願い致します。2回目の屋外工事につきましては具体的な工事日案内の準備が整い次第、こちらからご連絡いたします。※詳細はこちら【工事について】</span><a href="https://www.test.jp/1234/5678.html&amp;id=12211" rel="noopener noreferrer" target="_blank" style="background-color:rgb( 255 , 255 , 255 );font-family:'ys text' , 'arial' , sans-serif;font-size:16px;font-style:normal;font-weight:400;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">https://www.test.jp/1234/5678.html&amp;id=12211</a><span style="background-color:#ffffff;color:#000000;float:none;font-family:'ys text' , 'arial' , sans-serif;font-size:16px;font-style:normal;font-weight:400;text-decoration-style:initial;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">" </span>&gt;&gt; /tmp/test_pass.csv; done;</div></div><div><div><span style="background-color:#ffffff;color:#000000;float:none;font-family:'ys text' , 'arial' , sans-serif;font-size:16px;font-style:normal;font-weight:400;text-decoration-style:initial;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px"># Imported with errors</span></div></div><div><div>for i in $(seq 1 5722); do echo "NURO光です。明日の宅内工事お立合いよろしくお願い致します。2回目の屋外工事につきましては具体的な工事日案内の準備が整い次第、こちらからご連絡いたします。※詳細はこちら【工事について】https://www.test.jp/1234/5678.html&amp;id=12211" &gt;&gt; /tmp/test_fail.csv; done;</div></div><div><br /></div><div><br /></div><div>28.05.2022, 23:53, "PG Bug reporting form" &lt;noreply(at)postgresql(dot)org&gt;:</div><blockquote><p>The following bug has been logged on the website:<br /><br />Bug reference: 17501<br />Logged by: Vitaly Voronov<br />Email address: <a href="mailto:wizard_1024(at)tut(dot)by">wizard_1024(at)tut(dot)by</a><br />PostgreSQL version: 14.3<br />Operating system: CentOS Linux release 7.9.2009 (Core)<br />Description: <br /><br />Hello,<br /><br />We've seen a such bug: COPY command shows error "ERROR: invalid byte<br />sequence for encoding "UTF8": 0xe5" on file.<br />The same file with small amount of lines is imported without any errors.<br /><br />How to reproduce bug:<br /># create database<br /># create database with<br /># SQL_ASCII, C, C<br />createdb --encoding=SQL_ASCII --lc-collate=C --lc-ctype=C<br />--template=template0 test<br /><br /># connect to the database<br />psql test<br /><br /># Create table<br />CREATE TABLE test_data (<br />    test_data text<br />);<br /><br /># Import without error<br />truncate table test_data;<br />COPY test_data (test_data) FROM '/tmp/test_pass.csv' WITH DELIMITER AS ','<br />CSV QUOTE AS '"';<br /><br />COPY 207<br /><br /># Import with error<br />truncate table test_data;<br />COPY test_data (test_data) FROM '/tmp/test_fail.csv' WITH DELIMITER AS ','<br />CSV QUOTE AS '"';<br /><br />ERROR: invalid byte sequence for encoding "UTF8": 0xe5<br />CONTEXT: COPY test_data, line 627<br /><br /># both files contains the same rows, but test_fail contains more rows<br /># seems that the file more than 65K size cannot be imported<br /># if create DB with UTF8 encoding instead of SQL_ASCII - both tests will be<br />passed<br /><br /># How to generate files:<br /># Imported without errors<br />for i in $(seq 1 207); do echo<br />"NURO光です。明日の宅内工事お立合いよろしくお願い致します。2回目の屋外工事につきましては具体的な工事日案内の準備が整い次第、こちらからご連絡いたします。※詳細はこちら【工事について】<a href="https://www.test.jp/1234/5678.html&amp;id=12211">https://www.test.jp/1234/5678.html&amp;id=12211</a>"<br /></p><blockquote class="210e7a848e8fcb45wmi-quote"><blockquote class="210e7a848e8fcb45wmi-quote"> /tmp/test_pass.csv; done;<br /></blockquote></blockquote><p># Imported with errors<br />for i in $(seq 1 5722); do echo<br />"NURO光です。明日の宅内工事お立合いよろしくお願い致します。2回目の屋外工事につきましては具体的な工事日案内の準備が整い次第、こちらからご連絡いたします。※詳細はこちら【工事について】<a href="https://www.test.jp/1234/5678.html&amp;id=12211">https://www.test.jp/1234/5678.html&amp;id=12211</a>"<br /></p><blockquote class="210e7a848e8fcb45wmi-quote"><blockquote class="210e7a848e8fcb45wmi-quote"> /tmp/test_fail.csv; done;<br /></blockquote></blockquote><p><br /># Both files can be imported without any problem to PostgreSQL 11.<br /></p></blockquote>

Attachment Content-Type Size
unknown_filename text/html 6.1 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2022-05-28 22:54:18 BUG #17502: View based on window functions returns wrong results when queried
Previous Message PG Bug reporting form 2022-05-28 20:52:19 BUG #17501: COPY is failing with "ERROR: invalid byte sequence for encoding "UTF8": 0xe5"