Re: [HACKERS] GSOC'17 project introduction: Parallel COPY execution with errors handling

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Alexey Kondratov <kondratov(dot)aleksey(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Geoghegan <pg(at)bowt(dot)ie>, Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru>, Robert Haas <robertmhaas(at)gmail(dot)com>, Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Anastasia Lubennikova <lubennikovaAV(at)gmail(dot)com>
Subject: Re: [HACKERS] GSOC'17 project introduction: Parallel COPY execution with errors handling
Date: 2018-01-23 02:33:01
Message-ID: CAMsr+YETfrNbrnuiSjSZq8rhWiFiCKAzNz62iBmoehhd+kpgAQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 23 January 2018 at 15:21, Stephen Frost <sfrost(at)snowman(dot)net> wrote:

> Alexey,
>
> * Alexey Kondratov (kondratov(dot)aleksey(at)gmail(dot)com) wrote:
> > On Fri, Dec 1, 2017 at 1:58 AM, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
> wrote:
> > > On a *very* quick look, please use an enum to return from NextCopyFrom
> > > rather than 'int'. The chunks that change bool to int are very
> > > odd-looking. This would move the comment that explains the value from
> > > copy.c to copy.h, obviously. Also, you seem to be using non-ASCII
> dashes
> > > in the descriptions of those values; please don't.
> >
> > I will fix it, thank you.
> >
> > > Or maybe I misunderstood the patch completely.
> >
> > I hope so. Here is my thoughts how it all works, please correct me,
> > where I am wrong:
> >
> > 1) First, I have simply changed ereport level to WARNING for specific
> > validations (extra or missing columns, etc) if INGONE_ERRORS option is
> > used. All these checks are inside NextCopyFrom. Thus, this patch
> > performs here pretty much the same as before, excepting that it is
> > possible to skip bad lines, and this part should be safe as well
> >
> > 2) About PG_TRY/CATCH. I use it to catch the only one specific
> > function call inside NextCopyFrom--it is InputFunctionCall--which is
> > used just to parse datatype from the input string. I have no idea how
> > WAL write or trigger errors could get here
> >
> > All of these is done before actually forming a tuple, putting it into
> > the heap, firing insert-related triggers, etc. I am not trying to
> > catch all errors during the row processing, only input data errors. So
> > why is it unsafe?
>
> The issue is that InputFunctionCall is actually calling out to other
> functions and they could be allocating memory and doing other things.
> What you're missing by using PG_TRY() here without doing a PG_RE_THROW()
> is all the cleanup work that AbortTransaction() and friends do- ensuring
> there's a valid memory context (cleaning up the ones that might have
> been allocated by the input function), releasing locks, resetting user
> and security contexts, and a whole slew of other things.
>
> Is all of that always needed for an error thrown by an input function?
> Hard to say, really, since input functions can be provided by
> extensions. You might get away with doing this for int4in(), but we
> also have to be thinking about extensions like PostGIS which introduce
> their own geometry and geography data types that are a great deal more
> complicated.
>
>
For example, an input function could conceivably take a lwlock, do some
work in the heapam, or whatever.

We don't have much in the way of rules about what input functions can or
cannot do, so you can't assume much about their behaviour and what must /
must not be cleaned up. Nor can you just reset the state in a heavy handed
manner like (say) plpgsql does.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2018-01-23 02:35:40 Re: [PATCH] session_replication_role = replica with TRUNCATE
Previous Message Michael Paquier 2018-01-23 02:24:25 Re: Handling better supported channel binding types for SSL implementations