Re: GSOC'17 project introduction: Parallel COPY execution with errors handling

From: Alexey Kondratov <kondratov(dot)aleksey(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Стас <stas(dot)kelvich(at)gmail(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Subject: Re: GSOC'17 project introduction: Parallel COPY execution with errors handling
Date: 2017-04-10 18:46:29
Message-ID: 02B8154C-58ED-4E9A-8047-84BD261620FB@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Yes, sure, I don't doubt it. The question was around step 4 in the following possible algorithm:

1. Suppose we have to insert N records
2. Start subtransaction with these N records
3. Error is raised on k-th line
4. Then, we know that we can safely insert all lines from the 1st till (k - 1)
5. Report, save to errors table or silently drop k-th line
6. Next, try to insert lines from (k + 1) till Nth with another subtransaction
7. Repeat until the end of file

One can start subtransaction with those (k - 1) safe-lines and repeat it after each error line
OR
iterate till the end of file and start only one subtransaction with all lines excepting error lines.

Alexey

> On 10 Apr 2017, at 19:55, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Mon, Apr 10, 2017 at 11:39 AM, Alex K <kondratov(dot)aleksey(at)gmail(dot)com> wrote:
>> (1) It seems that starting new subtransaction at step 4 is not necessary. We
>> can just gather all error lines in one pass and at the end of input start
>> the only one additional subtransaction with all safe-lines at once: [1, ...,
>> k1 - 1, k1 + 1, ..., k2 - 1, k2 + 1, ...], where ki is an error line number.
>
> The only way to recover from an error is to abort the subtransaction,
> or to abort the toplevel transaction. Anything else is unsafe.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2017-04-10 18:49:10 Re: recent deadlock regression test failures
Previous Message Kevin Grittner 2017-04-10 18:45:12 Re: recent deadlock regression test failures