Re: Conflict handling for COPY FROM

From: Anthony Nowocien <anowocien(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Surafel Temesgen <surafel3000(at)gmail(dot)com>, Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Conflict handling for COPY FROM
Date: 2019-07-14 08:15:17
Message-ID: CAH5RRoNctxFUupW8EpxCu+_nz6vda1xFZqLv-f-8MK+YQmkZ9Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

sorry for answering a bit later than I hoped. Here is my review so far:

Contents

======

This patch starts to address in my opinion one of COPY's shortcoming, which
is error handling. PK and exclusion errors are taken care of, but not
(yet?) other types of errors.

Documentation is updated, "\h copy" also and some regression tests are
added.

Initial Run

=======

Patch applies (i've tested v6) cleanly.

make: OK

make install: OK

make check: OK

make installcheck: OK

Performance

========

I've tested the patch on a 1.1G file with 10 000 000 lines. Each test was
done 15 times on a small local VM. Table is without constraints.

head: 38,93s

head + patch: 38,76s

Another test was one a 0.1GB file with 1 000 000 lines. Each test done 10
times on a small local VM and the table has a pk.

COPY 4,550s

COPY CONFLICT 4,595s

COPY CONFLICT with only one pk error 10,529s

COPY CONFLICT pk error every 100 lines 10,859s

COPY CONFLICT pk error every 1000 lines 10,879s

I did not test exclusions so far.

Thoughts

======

I find the feature useful in itself. One big question for me is can it be
improved later on to handle other types of errors (like check constraints
for example) ? A "-1" for the error limit would be very useful in my
opinion.

I am also afraid that the name "error_limit" might mislead users into
thinking that all error types are handled. But I do not have a better
suggestion without making this clause much longer...

I've had a short look at the code, but this will need review by someone
else.

Anyway, thanks a lot for taking the time to work on it.

Anthony

On Sun, Jul 14, 2019 at 3:48 AM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:

> On Fri, Jul 12, 2019 at 1:42 AM Surafel Temesgen <surafel3000(at)gmail(dot)com>
> wrote:
> > Here are the patch that contain all the comment given except adding a
> way to specify
> > to ignoring all error because specifying a highest number can do the
> work and may be
> > try to store such badly structure data is a bad idea
>
> Hi Surafel,
>
> FYI GCC warns:
>
> copy.c: In function ‘CopyFrom’:
> copy.c:3383:8: error: ‘dest’ may be used uninitialized in this
> function [-Werror=maybe-uninitialized]
> (void) dest->receiveSlot(myslot, dest);
> ^
> copy.c:2702:16: note: ‘dest’ was declared here
> DestReceiver *dest;
> ^
>
> --
> Thomas Munro
> https://enterprisedb.com
>
>
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2019-07-14 08:50:12 Re: pgbench - implement strict TPC-B benchmark
Previous Message Alexander Lakhin 2019-07-14 05:24:01 Fix typos and inconsistencies for HEAD (take 6)