Re: COPY enhancements

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Emmanuel Cecchet <manu(at)asterdata(dot)com>, Emmanuel Cecchet <Emmanuel(dot)Cecchet(at)asterdata(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: COPY enhancements
Date: 2009-10-08 15:32:17
Message-ID: 603c8f070910080832o3b83a332p63575301a44c4c23@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Oct 8, 2009 at 11:01 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> Lest there be any unclarity, I am NOT trying to shoot down this
>> feature with my laser-powered bazooka.
>
> Well, if you need somebody to do that

Well, I'm trying not to demoralize people who have put in hard work,
however much it may not be usable. Still, your points are well taken.
I did raise some of them (with a lot less technical detail) in my
review of last night.

> So as far as I can see, the only form of COPY error handling that
> wouldn't be a cruel joke is to run a separate subtransaction for each
> row, and roll back the subtransaction on error.  Of course the problems
> with that are (a) speed, (b) the 2^32 limit on command counter IDs
> would mean a max of 2^32 rows per COPY, which is uncomfortably small
> these days.  Previous discussions of the problem have mentioned trying
> to batch multiple rows per subtransaction to alleviate both issues.
> Not easy of course, but that's why it's not been done yet.  With a
> patch like this you'd also have (c) how to avoid rolling back the
> insertions into the logging table.

Yeah. I think it's going to be hard to make this work without having
standalone transactions. One idea would be to start a subtransaction,
insert tuples until one fails, then rollback the subtransaction and
start a new one, and continue on until the error limit is reached. At
the end, if the number of rollbacks is > 0, then roll back the final
subtransaction also. This wouldn't have the property of getting the
unerrorred data into the table, but at least it would let you report
all the errors in a single pass, hopefully without being gratingly
slow. Subcommitting every single row is going to be really painful,
especially after Hot Standby goes in and we have to issue a WAL record
after every 64 subtransactions (AIUI).

Another possible approach, which isn't perfect either, is the idea of
allowing COPY to generate a single column of output of type text[].
That greatly reduces the number of possible error cases, and at least
gets the data into the DB where you can hack on it. But it's still
going to be painful for some use cases.

...Robert

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2009-10-08 15:33:49 Re: COPY enhancements
Previous Message Dominic Bevacqua 2009-10-08 15:29:51 incorrect exit code from psql with single transaction + violation of deferred FK constraint