Re: Parallel copy

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Parallel copy
Date: 2020-11-18 10:12:43
Message-ID: CALDaNm3eV0KSqPrykti_5eZ4TLU0C7Gph0E1U-vH-QkQBBLHrQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Oct 31, 2020 at 2:07 AM Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
wrote:
>
> Hi,
>
> I've done a bit more testing today, and I think the parsing is busted in
> some way. Consider this:
>
> test=# create extension random;
> CREATE EXTENSION
>
> test=# create table t (a text);
> CREATE TABLE
>
> test=# insert into t select random_string(random_int(10, 256*1024))
from generate_series(1,10000);
> INSERT 0 10000
>
> test=# copy t to '/mnt/data/t.csv';
> COPY 10000
>
> test=# truncate t;
> TRUNCATE TABLE
>
> test=# copy t from '/mnt/data/t.csv';
> COPY 10000
>
> test=# truncate t;
> TRUNCATE TABLE
>
> test=# copy t from '/mnt/data/t.csv' with (parallel 2);
> ERROR: invalid byte sequence for encoding "UTF8": 0x00
> CONTEXT: COPY t, line 485: "m&\nh%_a"%r]>qtCl:Q5ltvF~;2oS6(at)HB
>F>og,bD$Lw'nZY\tYl#BH\t{(j~ryoZ08"SGU~(dot)}8CcTRk1\ts$(at)U3szCC+U1U3i@P..."
> parallel worker
>
>
> The functions come from an extension I use to generate random data, I've
> pushed it to github [1]. The random_string() generates a random string
> with ASCII characters, symbols and a couple special characters (\r\n\t).
> The intent was to try loading data where a fields may span multiple 64kB
> blocks and may contain newlines etc.
>
> The non-parallel copy works fine, the parallel one fails. I haven't
> investigated the details, but I guess it gets confused about where a
> string starts/end, or something like that.
>

Thanks for identifying this issue, this issue is fixed in v10 patch posted
at [1]
[1]
https://www.postgresql.org/message-id/CALDaNm05FnA-ePvYV_t2%2BWE_tXJymbfPwnm%2Bkc9y1iMkR%2BNbUg%40mail.gmail.com

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message vignesh C 2020-11-18 10:14:33 Re: Parallel copy
Previous Message vignesh C 2020-11-18 10:10:45 Re: Parallel copy