Re: GSOC'17 project introduction: Parallel COPY execution with errors handling

From: Alex K <kondratov(dot)aleksey(at)gmail(dot)com>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru>, Robert Haas <robertmhaas(at)gmail(dot)com>, Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Anastasia Lubennikova <lubennikovaAV(at)gmail(dot)com>
Subject: Re: GSOC'17 project introduction: Parallel COPY execution with errors handling
Date: 2017-06-21 14:37:44
Message-ID: CADfU8WygFBs5Vv8PhheP0sOfROLaVU1=0PXHmUUO1C_w1iMNdQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On 16 Jun 2017, at 21:30, Alexey Kondratov <kondratov(dot)aleksey(at)gmail(dot)com> wrote:

> > On 13 Jun 2017, at 01:44, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:

> > Speculative insertion has the following special entry points to
> > heapam.c and execIndexing.c, currently only called within
> > nodeModifyTable.c

> > Offhand, it doesn't seem like it would be that hard to teach another
> > heap_insert() caller the same tricks.

> I went through the nodeModifyTable.c code and it seems not to be so
> difficult to do the same inside COPY.

After a more precise look, I have figured out at least one difficulty, COPY
and INSERT follow the different execution paths: INSERT goes through
the Planner, while COPY does not. It leads to the absence of some required
attributes like arbiterIndexes, which are available during INSERT via
PlanState/ModifyTableState. Probably it is possible to get the same in the
COPY, but it is not clear for me how.

Anyway, adding of the 'speculative insertion' into the COPY is worth of a
separated patch; and I would be glad to try implementing it.

In the same time I have prepared a complete working patch with:

- ignoring of the input data formatting errors
- IGNORE_ERRORS parameter in the COPY options
- updated regression tests

Please, find the patch attached or check the web UI diff on GitHub as always:
https://github.com/ololobus/postgres/pull/1/files

Alexey

Attachment Content-Type Size
copy-errors-v1.0.diff.zip application/zip 8.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-06-21 14:38:00 Re: Default Partition for Range
Previous Message Andrew Dunstan 2017-06-21 14:34:46 Re: pg_bsd_indent 2.0 is available from git.postgresql.org