Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

From: torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>
To: Damir Belyalov <dam(dot)bel07(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
Date: 2022-08-15 12:29:10
Message-ID: e78a86704ab2b2b967f0d674e5a72643@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2022-07-19 21:40, Damir Belyalov wrote:
> Hi!
>
> Improved my patch by adding block subtransactions.
> The block size is determined by the REPLAY_BUFFER_SIZE parameter.
> I used the idea of a buffer for accumulating tuples in it.
> If we read REPLAY_BUFFER_SIZE rows without errors, the subtransaction
> will be committed.
> If we find an error, the subtransaction will rollback and the buffer
> will be replayed containing tuples.

Thanks for working on this!

I tested 0002-COPY-IGNORE_ERRORS.patch and faced an unexpected behavior.

I loaded 10000 rows which contained 1 wrong row.
I expected I could see 9999 rows after COPY, but just saw 999 rows.

Since when I changed MAX_BUFFERED_TUPLES from 1000 to other values, the
number of loaded rows also changed, I imagine MAX_BUFFERED_TUPLES might
be giving influence of this behavior.

```sh
$ cat /tmp/test10000.dat

1 aaa
2 aaa
3 aaa
4 aaa
5 aaa
6 aaa
7 aaa
8 aaa
9 aaa
10 aaa
11 aaa
...
9994 aaa
9995 aaa
9996 aaa
9997 aaa
9998 aaa
9999 aaa
xxx aaa
```

```SQL
=# CREATE TABLE test (id int, data text);

=# COPY test FROM '/tmp/test10000.dat' WITH (IGNORE_ERRORS);
WARNING: COPY test, line 10000, column i: "xxx"
COPY 9999

=# SELECT COUNT(*) FROM test;
count
-------
999
(1 row)
```

BTW I may be overlooking it, but have you submit this proposal to the
next CommitFest?

https://commitfest.postgresql.org/39/

--
Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Christoph Berg 2022-08-15 12:45:24 pg_receivewal and SIGTERM
Previous Message Marina Polyakova 2022-08-15 12:06:32 Re: ICU for global collation