Re: Conflict handling for COPY FROM

From: Anthony Nowocien <anowocien(at)gmail(dot)com>
To: Surafel Temesgen <surafel3000(at)gmail(dot)com>
Cc: Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Conflict handling for COPY FROM
Date: 2019-07-03 13:24:55
Message-ID: CAH5RRoO_2n1PYuONuA4auhYa1xofv+AzvKVDU94XTDZ=+EiwvQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,
I'm very interested in this patch and would like to give a review within a
week. On the feature side, how about simply using the less verbose "ERRORS"
instead of "ERROR LIMIT" ?

On Wed, Jul 3, 2019 at 1:42 PM Surafel Temesgen <surafel3000(at)gmail(dot)com>
wrote:

> Hi Alexey,
> Thank you for looking at it
>
> On Tue, Jul 2, 2019 at 7:57 PM Alexey Kondratov <
> a(dot)kondratov(at)postgrespro(dot)ru> wrote:
>
>> On 28.06.2019 16:12, Alvaro Herrera wrote:
>> >> On Wed, Feb 20, 2019 at 7:04 PM Andres Freund <andres(at)anarazel(dot)de>
>> wrote:
>> >>> Or even just return it as a row. CopyBoth is relatively widely
>> supported
>> >>> these days.
>> >> i think generating warning about it also sufficiently meet its propose
>> of
>> >> notifying user about skipped record with existing logging facility
>> >> and we use it for similar propose in other place too. The different
>> >> i see is the number of warning that can be generated
>> > Warnings seem useless for this purpose. I'm with Andres: returning rows
>> > would make this a fine feature. If the user wants the rows in a table
>> > as Andrew suggests, she can use wrap the whole thing in an insert.
>>
>> I agree with previous commentators that returning rows will make this
>> feature more versatile.
>
>
> I agree. am looking at the options
>
> Also, I would prefer having an option to ignore all errors, e.g. with
>> option ERROR_LIMIT set to -1. Because it is rather difficult to estimate
>> a number of future errors if you are playing with some badly structured
>> data, while always setting it to 100500k looks ugly.
>>
>>
> Good idea
>
> I also +1 having an option to ignore all errors. Other RDBMS might use a
large number, but "-1" seems cleaner so far.

>
>
>> 1) Calculation of processed rows isn't correct (I've checked). You do it
>> in two places, and
>>
>> - processed++;
>> + if (!cstate->error_limit)
>> + processed++;
>>
>> is never incremented if ERROR_LIMIT is specified and no errors
>> occurred/no constraints exist, so the result will always be 0. However,
>> if primary column with constraints exists, then processed is calculated
>> correctly, since another code path is used:
>>
>>
> Correct. i will fix
>
> + if (specConflict)
>> + {
>> + ...
>> + }
>> + else
>> + processed++;
>>
>> I would prefer this calculation in a single place (as it was before
>> patch) for simplicity and in order to avoid such problems.
>>
>>
> ok
>
>
>> 2) This ExecInsertIndexTuples call is only executed now if ERROR_LIMIT
>> is specified and was exceeded, which doesn't seem to be correct, does it?
>>
>> - if (resultRelInfo->ri_NumIndices > 0)
>> + if (resultRelInfo->ri_NumIndices > 0 &&
>> cstate->error_limit == 0)
>> recheckIndexes =
>> ExecInsertIndexTuples(myslot,
>>
>>
> No it alwase executed . I did it this way to avoid
> inserting index tuple twice but i see its unlikely
>
>
>> 3) Trailing whitespaces added to error messages and tests for some reason:
>>
>> + ereport(WARNING,
>> + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
>> + errmsg("skipping \"%s\" --- missing data
>> for column \"%s\" ",
>>
>> + ereport(ERROR,
>> + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
>> + errmsg("missing data for column \"%s\" ",
>>
>> -ERROR: missing data for column "e"
>> +ERROR: missing data for column "e"
>> CONTEXT: COPY x, line 1: "2000 230 23 23"
>>
>> -ERROR: missing data for column "e"
>> +ERROR: missing data for column "e"
>> CONTEXT: COPY x, line 1: "2001 231 \N \N"
>>
>>
>
> regards
> Surafel
>

Thanks,
Anthony

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2019-07-03 13:38:16 Re: TopoSort() fix
Previous Message John Naylor 2019-07-03 12:14:25 Re: benchmarking Flex practices