Re: COPY enhancements

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Emmanuel Cecchet <manu(at)asterdata(dot)com>, Emmanuel Cecchet <Emmanuel(dot)Cecchet(at)asterdata(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: COPY enhancements
Date: 2009-10-08 15:01:53
Message-ID: 23958.1255014113@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> Lest there be any unclarity, I am NOT trying to shoot down this
> feature with my laser-powered bazooka.

Well, if you need somebody to do that --- I took a quick look through
this patch, and it is NOT going to get committed. Not in anything
approximately like its current form. The questions about how the
logging should act don't come anywhere near addressing the fundamental
problem with the patch, which is that IT DOESN'T WORK. You can *not*
suppose that you can just put a PG_TRY block around some processing
and catch any random error and keep going. I see that the patch tries
to avoid this by only catching certain major errcode categories, which
merely makes it useless while still being untrustworthy.

As an example of a case that anyone would expect to work that cannot
work with this approach, I submit unique-index violations. When the
index throws the error, the bad row has already been inserted in the
table (and maybe some other indexes too). The *only* way to clean up
is to abort the transaction/subtransaction so that the row will not be
considered good.

More generally, since we are calling user-defined BEFORE INSERT triggers
in there, we have to assume that absolutely anything at all could have
been done by the triggers. PG_CATCH doesn't even pretend to cope with
that.

So as far as I can see, the only form of COPY error handling that
wouldn't be a cruel joke is to run a separate subtransaction for each
row, and roll back the subtransaction on error. Of course the problems
with that are (a) speed, (b) the 2^32 limit on command counter IDs
would mean a max of 2^32 rows per COPY, which is uncomfortably small
these days. Previous discussions of the problem have mentioned trying
to batch multiple rows per subtransaction to alleviate both issues.
Not easy of course, but that's why it's not been done yet. With a
patch like this you'd also have (c) how to avoid rolling back the
insertions into the logging table.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2009-10-08 15:29:01 Re: COPY enhancements
Previous Message Zdenek Kotala 2009-10-08 14:36:09 Re: postgres 8.3.8 and Solaris 10_x86 64 bit problems?