Re: [HACKERS] GSOC'17 project introduction: Parallel COPY execution with errors handling

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Alexey Kondratov <kondratov(dot)aleksey(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Geoghegan <pg(at)bowt(dot)ie>, Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru>, Robert Haas <robertmhaas(at)gmail(dot)com>, Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Anastasia Lubennikova <lubennikovaAV(at)gmail(dot)com>
Subject: Re: [HACKERS] GSOC'17 project introduction: Parallel COPY execution with errors handling
Date: 2018-01-23 02:21:22
Message-ID: 20180123022122.GD2416@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Alexey,

* Alexey Kondratov (kondratov(dot)aleksey(at)gmail(dot)com) wrote:
> On Fri, Dec 1, 2017 at 1:58 AM, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote:
> > On a *very* quick look, please use an enum to return from NextCopyFrom
> > rather than 'int'. The chunks that change bool to int are very
> > odd-looking. This would move the comment that explains the value from
> > copy.c to copy.h, obviously. Also, you seem to be using non-ASCII dashes
> > in the descriptions of those values; please don't.
>
> I will fix it, thank you.
>
> > Or maybe I misunderstood the patch completely.
>
> I hope so. Here is my thoughts how it all works, please correct me,
> where I am wrong:
>
> 1) First, I have simply changed ereport level to WARNING for specific
> validations (extra or missing columns, etc) if INGONE_ERRORS option is
> used. All these checks are inside NextCopyFrom. Thus, this patch
> performs here pretty much the same as before, excepting that it is
> possible to skip bad lines, and this part should be safe as well
>
> 2) About PG_TRY/CATCH. I use it to catch the only one specific
> function call inside NextCopyFrom--it is InputFunctionCall--which is
> used just to parse datatype from the input string. I have no idea how
> WAL write or trigger errors could get here
>
> All of these is done before actually forming a tuple, putting it into
> the heap, firing insert-related triggers, etc. I am not trying to
> catch all errors during the row processing, only input data errors. So
> why is it unsafe?

The issue is that InputFunctionCall is actually calling out to other
functions and they could be allocating memory and doing other things.
What you're missing by using PG_TRY() here without doing a PG_RE_THROW()
is all the cleanup work that AbortTransaction() and friends do- ensuring
there's a valid memory context (cleaning up the ones that might have
been allocated by the input function), releasing locks, resetting user
and security contexts, and a whole slew of other things.

Is all of that always needed for an error thrown by an input function?
Hard to say, really, since input functions can be provided by
extensions. You might get away with doing this for int4in(), but we
also have to be thinking about extensions like PostGIS which introduce
their own geometry and geography data types that are a great deal more
complicated.

In the end, this isn't an acceptable approach to this problem. The
approach outlined previously using sub-transactions which attempted
insert some number of records and then, on failure, restarted and
inserted up until the failed record and then skipped it, starting over
again with the next batch, seemed pretty reasonable to me. If you have
time to work on that, that'd be great, otherwise we'll probably need to
mark this as returned with feedback.

Thanks!

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2018-01-23 02:24:25 Re: Handling better supported channel binding types for SSL implementations
Previous Message Michael Paquier 2018-01-23 02:07:33 Re: [HACKERS] A design for amcheck heapam verification