Re: COPY enhancements

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Emmanuel Cecchet <manu(at)asterdata(dot)com>, Emmanuel Cecchet <Emmanuel(dot)Cecchet(at)asterdata(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: COPY enhancements
Date: 2009-10-08 16:50:41
Message-ID: 603c8f070910080950n1d409d38v40cd4147c961b1dd@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Oct 8, 2009 at 12:21 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Thu, Oct 8, 2009 at 11:50 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> I wonder whether we could break down COPY into sub-sub
>>> transactions to work around that...
>
>> How would that work?  Don't you still need to increment the command counter?
>
> Actually, command counter doesn't help because incrementing the CC
> doesn't give you a rollback boundary between rows inserted before it
> and afterwards.  What I was vaguely imaging was

Oh, right.

> So really we have to find some way to only expend one XID per failure,
> not one per row.

Agreed.

> Another approach that was discussed earlier was to divvy the rows into
> batches.  Say every thousand rows you sub-commit and start a new
> subtransaction.  Up to that point you save aside the good rows somewhere
> (maybe a tuplestore).  If you get a failure partway through a batch,
> you start a new subtransaction and re-insert the batch's rows up to the
> bad row.  This could be pretty awful in the worst case, but most of the
> time it'd probably perform well.  You could imagine dynamically adapting
> the batch size depending on how often errors occur ...

Yeah, I think that's promising. There is of course the possibility
that a row which previously succeeded could fail the next time around,
but most of the time that shouldn't happen, and it should be possible
to code it so that it still behaves somewhat sanely if it does.

...Robert

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David E. Wheeler 2009-10-08 16:53:24 Re: Issues for named/mixed function notation patch
Previous Message Kevin Grittner 2009-10-08 16:49:33 Re: COPY enhancements