On Thu, Oct 8, 2009 at 11:01 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> Lest there be any unclarity, I am NOT trying to shoot down this
>> feature with my laser-powered bazooka.
> Well, if you need somebody to do that
Well, I'm trying not to demoralize people who have put in hard work,
however much it may not be usable. Still, your points are well taken.
I did raise some of them (with a lot less technical detail) in my
review of last night.
> So as far as I can see, the only form of COPY error handling that
> wouldn't be a cruel joke is to run a separate subtransaction for each
> row, and roll back the subtransaction on error. Of course the problems
> with that are (a) speed, (b) the 2^32 limit on command counter IDs
> would mean a max of 2^32 rows per COPY, which is uncomfortably small
> these days. Previous discussions of the problem have mentioned trying
> to batch multiple rows per subtransaction to alleviate both issues.
> Not easy of course, but that's why it's not been done yet. With a
> patch like this you'd also have (c) how to avoid rolling back the
> insertions into the logging table.
Yeah. I think it's going to be hard to make this work without having
standalone transactions. One idea would be to start a subtransaction,
insert tuples until one fails, then rollback the subtransaction and
start a new one, and continue on until the error limit is reached. At
the end, if the number of rollbacks is > 0, then roll back the final
subtransaction also. This wouldn't have the property of getting the
unerrorred data into the table, but at least it would let you report
all the errors in a single pass, hopefully without being gratingly
slow. Subcommitting every single row is going to be really painful,
especially after Hot Standby goes in and we have to issue a WAL record
after every 64 subtransactions (AIUI).
Another possible approach, which isn't perfect either, is the idea of
allowing COPY to generate a single column of output of type text.
That greatly reduces the number of possible error cases, and at least
gets the data into the DB where you can hack on it. But it's still
going to be painful for some use cases.
In response to
pgsql-hackers by date
|Next:||From: Robert Haas||Date: 2009-10-08 15:33:49|
|Subject: Re: COPY enhancements|
|Previous:||From: Dominic Bevacqua||Date: 2009-10-08 15:29:51|
|Subject: incorrect exit code from psql with single transaction + violation
of deferred FK constraint|