Re: COPY enhancements

From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: Rod Taylor <rod(dot)taylor(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Emmanuel Cecchet <manu(at)asterdata(dot)com>, Emmanuel Cecchet <Emmanuel(dot)Cecchet(at)asterdata(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: COPY enhancements
Date: 2009-10-08 17:19:03
Message-ID: alpine.GSO.2.01.0910081310300.25300@westnet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 8 Oct 2009, Rod Taylor wrote:

> 1) Having copy remember which specific line caused the error. So it can
> replace lines 1 through 487 in a subtransaction since it knows those are
> successful. Run 488 in its on subtransaction. Run 489 through ... in a
> new subtransaction.

This is the standard technique used in other bulk loaders I'm aware of.

> 2) Increasing the number of records per subtransaction if data is clean.
> It wouldn't take long until you were inserting millions of records per
> subtransaction for a large data set.

You can make it adaptive in both directions with some boundaries. If you
double the batch size every time there's a clean commit, and halve it
every time there's an error, start batching at 1024 and bound to the range
[1,1048576]. That's close to optimal behavior here if combined with the
targeted retry described in (1).

The retry scheduling and batch size parts are the trivial and well
understood parts here. Actually getting all this to play nicely with
transactions and commit failures (rather than just bad data failures) is
what's difficult.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2009-10-08 17:21:16 Re: Issues for named/mixed function notation patch
Previous Message Joshua D. Drake 2009-10-08 17:10:18 Re: Concurrency testing