Quick Links

Re: Conflict handling for COPY FROM

From:	Surafel Temesgen <surafel3000(at)gmail(dot)com>
To:	Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Conflict handling for COPY FROM
Date:	2019-07-03 11:42:00
Message-ID:	CALAY4q8rs26AOHf48mC67bGgN4yHfrycnq5BgihyVpUe7hypyQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi Alexey,
Thank you for looking at it

On Tue, Jul 2, 2019 at 7:57 PM Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru>
wrote:

> On 28.06.2019 16:12, Alvaro Herrera wrote:
> >> On Wed, Feb 20, 2019 at 7:04 PM Andres Freund <andres(at)anarazel(dot)de>
> wrote:
> >>> Or even just return it as a row. CopyBoth is relatively widely
> supported
> >>> these days.
> >> i think generating warning about it also sufficiently meet its propose
> of
> >> notifying user about skipped record with existing logging facility
> >> and we use it for similar propose in other place too. The different
> >> i see is the number of warning that can be generated
> > Warnings seem useless for this purpose. I'm with Andres: returning rows
> > would make this a fine feature. If the user wants the rows in a table
> > as Andrew suggests, she can use wrap the whole thing in an insert.
>
> I agree with previous commentators that returning rows will make this
> feature more versatile.

I agree. am looking at the options

Also, I would prefer having an option to ignore all errors, e.g. with
> option ERROR_LIMIT set to -1. Because it is rather difficult to estimate
> a number of future errors if you are playing with some badly structured
> data, while always setting it to 100500k looks ugly.
>
>
Good idea

> 1) Calculation of processed rows isn't correct (I've checked). You do it
> in two places, and
>
> - processed++;
> + if (!cstate->error_limit)
> + processed++;
>
> is never incremented if ERROR_LIMIT is specified and no errors
> occurred/no constraints exist, so the result will always be 0. However,
> if primary column with constraints exists, then processed is calculated
> correctly, since another code path is used:
>
>
Correct. i will fix

+ if (specConflict)
> + {
> + ...
> + }
> + else
> + processed++;
>
> I would prefer this calculation in a single place (as it was before
> patch) for simplicity and in order to avoid such problems.
>
>
ok

> 2) This ExecInsertIndexTuples call is only executed now if ERROR_LIMIT
> is specified and was exceeded, which doesn't seem to be correct, does it?
>
> - if (resultRelInfo->ri_NumIndices > 0)
> + if (resultRelInfo->ri_NumIndices > 0 &&
> cstate->error_limit == 0)
> recheckIndexes =
> ExecInsertIndexTuples(myslot,
>
>
No it alwase executed . I did it this way to avoid
inserting index tuple twice but i see its unlikely

> 3) Trailing whitespaces added to error messages and tests for some reason:
>
> + ereport(WARNING,
> + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
> + errmsg("skipping \"%s\" --- missing data
> for column \"%s\" ",
>
> + ereport(ERROR,
> + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
> + errmsg("missing data for column \"%s\" ",
>
> -ERROR: missing data for column "e"
> +ERROR: missing data for column "e"
> CONTEXT: COPY x, line 1: "2000 230 23 23"
>
> -ERROR: missing data for column "e"
> +ERROR: missing data for column "e"
> CONTEXT: COPY x, line 1: "2001 231 \N \N"
>
>

regards
Surafel

In response to

Re: Conflict handling for COPY FROM at 2019-07-02 16:57:21 from Alexey Kondratov

Responses

Re: Conflict handling for COPY FROM at 2019-07-03 13:24:55 from Anthony Nowocien

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	John Naylor	2019-07-03 12:14:25	Re: benchmarking Flex practices
Previous Message	Surafel Temesgen	2019-07-03 10:40:18	Re: FETCH FIRST clause WITH TIES option