Re: GSOC'17 project introduction: Parallel COPY execution with errors handling

From: Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com>, Alexey Kondratov <kondratov(dot)aleksey(at)gmail(dot)com>
Cc: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Craig Ringer <craig(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Subject: Re: GSOC'17 project introduction: Parallel COPY execution with errors handling
Date: 2017-04-12 17:57:33
Message-ID: 962E8012-78DB-421C-AFF3-A85DE39E469C@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


> On 12 Apr 2017, at 20:23, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Wed, Apr 12, 2017 at 1:18 PM, Nicolas Barbier
> <nicolas(dot)barbier(at)gmail(dot)com> wrote:
>> 2017-04-11 Robert Haas <robertmhaas(at)gmail(dot)com>:
>>> If the data quality is poor (say, 50% of lines have errors) it's
>>> almost impossible to avoid runaway XID consumption.
>>
>> Yup, that seems difficult to work around with anything similar to the
>> proposed. So the docs might need to suggest not to insert a 300 GB
>> file with 50% erroneous lines :-).
>
> Yep. But it does seem reasonably likely that someone might shoot
> themselves in the foot anyway. Maybe we just live with that.
>

Moreover if that file consists of one-byte lines (plus one byte of newline char)
then during its import xid wraparound will happens 18 times =)

I think it’s reasonable at least to have something like max_errors parameter
to COPY, that will be set by default to 1000 for example. If user will hit that
limit then it is a good moment to put a warning about possible xid consumption
in case of bigger limit.

However I think it worth of quick research whether it is possible to create special
code path for COPY in which errors don’t cancel transaction. At least when
COPY called outside of transaction block.

Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2017-04-12 18:09:07 Re: Some thoughts about SCRAM implementation
Previous Message Álvaro Hernández Tortosa 2017-04-12 17:38:22 Re: Some thoughts about SCRAM implementation