Re: GSOC'17 project introduction: Parallel COPY execution with errors handling

From: Alexey Kondratov <kondratov(dot)aleksey(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru>, Robert Haas <robertmhaas(at)gmail(dot)com>, Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Anastasia Lubennikova <lubennikovaAV(at)gmail(dot)com>
Subject: Re: GSOC'17 project introduction: Parallel COPY execution with errors handling
Date: 2017-06-16 18:30:29
Message-ID: D31F460B-317C-405F-B591-BE752CE8D90E@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Sorry for a previous email, I have accidentally sent it unfinished.

> On 13 Jun 2017, at 01:44, Peter Geoghegan <pg(at)bowt(dot)ie <mailto:pg(at)bowt(dot)ie>> wrote:
>
> Speculative insertion has the following special entry points to
> heapam.c and execIndexing.c, currently only called within
> nodeModifyTable.c
>
> Offhand, it doesn't seem like it would be that hard to teach another
> heap_insert() caller the same tricks.

I went through the nodeModifyTable.c code and it seems not to be so
difficult to do the same inside COPY.

> My sense is that it's going to be hard to sell a committer on any
> design that consumes subtransactions in a way that's not fairly
> obvious to the user, and doesn't have a pretty easily understood worse
> case.

Yes, and worse case probably will be a quite frequent case, since it is not
possible to do heap_multi_insert, when BEFORE/INSTEAD triggers or partitioning
exist (according to the current copy.c code). Thus, it will frequently fall back
into a single heap_insert, each being wrapped with subtransaction will
consume XIDs too greedy and seriously affect performance. I like my initial
idea less and less.

By the way, is it possible to use heap_multi_insert with speculative insertion too?

> I haven't thought about this very carefully, but I guess you could do
> something like passing a flag to ExecConstraints() that indicates
> "don't throw an error; instead, just return false so I know not to
> proceed"

Currently, ExecConstraints always throws an error and I do not think, that
it would be wise from my side to modify its behaviour.

I have updated my patch (rebased over the topmost master commit
94da2a6a9a05776953524424a3d8079e54bc5d94). Please, find patch
file attached or always up to date version on GitHub
https://github.com/ololobus/postgres/pull/1/files <https://github.com/ololobus/postgres/pull/1/files>

It catches all major errors in the input data:

1) Rows with less/extra columns cause WARNINGs and are skipped

2) I found that input type format errors are thrown from the
InputFunctionCall; and wrapped it up with PG_TRY/CATCH.

I am not sure that it is 100% transactionally safe, but it seems so,
since all these errors are handled before this point
https://github.com/postgres/postgres/blob/651902deb1551db8b401fdeab9bdb8a61cee7f9f/src/backend/commands/copy.c#L2628 <https://github.com/postgres/postgres/blob/651902deb1551db8b401fdeab9bdb8a61cee7f9f/src/backend/commands/copy.c#L2628>,
where current COPY implementation has a mechanism to skip tuple.
I use the same skip_tuple flag.

Patch passes all regression tests, excepting a few tests due to the slightly
changed error message texts.

Now, I think that it may be a good idea to separate all possible errors
into two groups:
– Malformed input data
– DB conflicts during insertion

First is solved (I hope) well with the current patch. I can add, e.g.
MAXERRORS flag to COPY, which will limit number of errors.

Second may be solved with speculative insertion using the same
syntax ON CONFLICT DO as in INSERT statement.

Following this way, we do not use subtransactions at all; and keeping
predictable and consistent behaviour of INSERT and COPY along the
database. For me it sounds much better, than just swallowing all errors
without a difference and any logic.

Alexey

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2017-06-16 18:42:38 Re: Preliminary results for proposed new pgindent implementation
Previous Message Tom Lane 2017-06-16 18:23:00 Re: Preliminary results for proposed new pgindent implementation