Re: Conflict handling for COPY FROM

From: Surafel Temesgen <surafel3000(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "asaba(dot)takanori(at)fujitsu(dot)com" <asaba(dot)takanori(at)fujitsu(dot)com>, Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>, Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Anthony Nowocien <anowocien(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Conflict handling for COPY FROM
Date: 2020-03-30 12:46:13
Message-ID: CALAY4q_5y3TZYkC5nCYJr1Lufbb3kykd6KAedYyW3J3rJ_XsAA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 27, 2020 at 3:27 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Surafel Temesgen <surafel3000(at)gmail(dot)com> writes:
> > [ conflict-handling-copy-from-v16.patch ]
>
> I took a quick look at this patch, since it was marked "ready for
> committer", but I don't see how it can possibly be considered committable.
>
> 1. Covering only the errors that are thrown in DoCopy itself doesn't
> seem to me to pass the smell test. Yes, I'm sure there's some set of
> use-cases for which that'd be helpful, but I think most people would
> expect a "skip errors" option to be able to handle cases like malformed
> numbers or bad encoding. I understand the implementation reasons that
> make it impractical to cover other errors, but do we really want a
> feature that most people will see as much less than half-baked? I fear
> it'll be an embarrassment.
>
> I did small research and most major database management system didn't
claim
they handle every problem in loading file at least in every usage scenario.

> 2. If I'm reading the patch correctly, (some) rejected rows are actually
> sent back to the client. This is a wire protocol break of the first
> magnitude, and can NOT be accepted. At least not without some provisions
> for not doing it with a client that isn't prepared for it. I also am
> fairly worried about the possibilities for deadlock (ie, both ends stuck
> waiting for the other one to absorb data) if the return traffic volume is
> high enough.
>
>
if my understanding is correct copy_in occur in sub protocol inside simple
or
extended query protocol that know and handle the responds

> 3. I don't think enough thought has been put into the reporting, either.
>
> + ereport(WARNING,
> + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
> + errmsg("skipping \"%s\" --- extra data after
> last expected column",
> + cstate->line_buf.data)));
>
> That's not going to be terribly helpful if the input line runs to many
> megabytes. Or even if no individual line is very long, but you get
> millions of such warnings. It's pretty much at variance with our
> message style guidelines (among other things, those caution you to keep
> the primary error message short); and it's randomly different from
> COPY's existing practice, which is to show the faulty line as CONTEXT.
> Plus it seems plenty weird that some errors are reported this way while
> others are reported by sending back the bad tuple (with, it looks like,
> no mention of what the specific problem is ... what if you have a lot of
> unique indexes?).
>
> Currently we can’t get problem description in speculative insertion
infrastructure and am afraid adding problem description to return tuple
will make the data less usable without further processing.Warning raised
for error that happen before tuple contracted. Other option is to skip those
record silently but reporting to user give user the chance to correct it.

> BTW, while I don't know much about the ON CONFLICT (speculative
> insertion) infrastructure, I wonder how well it really works to
> not specify an arbiter index. I see that you're getting away with
> it in a trivial test case that has exactly one index, but that's
> not stressing the point very hard.
>
> If arbiter index is not specified that means all indexes as the comment in
ExecCheckIndexConstraints stated

/* ----------------------------------------------------------------

* ExecCheckIndexConstraints

*

* This routine checks if a tuple violates any unique or

* exclusion constraints. Returns true if there is no conflict.

* Otherwise returns false, and the TID of the conflicting

* tuple is returned in *conflictTid.

*

* If 'arbiterIndexes' is given, only those indexes are checked.

* NIL means all indexes.

regards
Surafel

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ahsan Hadi 2020-03-30 12:58:18 Re: WIP/PoC for parallel backup
Previous Message Julien Rouhaud 2020-03-30 12:43:56 Re: WAL usage calculation patch