Re: COPY enhancements

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Emmanuel Cecchet <manu(at)asterdata(dot)com>, Emmanuel Cecchet <Emmanuel(dot)Cecchet(at)asterdata(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: COPY enhancements
Date: 2009-10-08 16:21:00
Message-ID: 25557.1255018860@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Thu, Oct 8, 2009 at 11:50 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> I wonder whether we could break down COPY into sub-sub
>> transactions to work around that...

> How would that work? Don't you still need to increment the command counter?

Actually, command counter doesn't help because incrementing the CC
doesn't give you a rollback boundary between rows inserted before it
and afterwards. What I was vaguely imaging was

-- outer transaction for whole COPY

-- sub-transactions that are children of outer transaction

-- sub-sub-transactions that are children of sub-transactions

You'd eat a sub-sub-transaction per row, and start a new sub-transaction
every 2^32 rows.

However, on second thought this really doesn't get us anywhere, it just
moves the 2^32 restriction somewhere else. Once the outer transaction
gets to be more than 2^31 XIDs old, the database is going to stop
because of XID wraparound.

So really we have to find some way to only expend one XID per failure,
not one per row.

Another approach that was discussed earlier was to divvy the rows into
batches. Say every thousand rows you sub-commit and start a new
subtransaction. Up to that point you save aside the good rows somewhere
(maybe a tuplestore). If you get a failure partway through a batch,
you start a new subtransaction and re-insert the batch's rows up to the
bad row. This could be pretty awful in the worst case, but most of the
time it'd probably perform well. You could imagine dynamically adapting
the batch size depending on how often errors occur ...

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2009-10-08 16:26:12 Re: one line comment style
Previous Message Joshua D. Drake 2009-10-08 16:16:35 Re: COPY enhancements