From:
Rod Taylor <rod(dot)taylor(at)gmail(dot)com>
To:
Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:
Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Emmanuel Cecchet <manu(at)asterdata(dot)com>, Emmanuel Cecchet <Emmanuel(dot)Cecchet(at)asterdata(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:
Re: COPY enhancements
Date:
2009-10-08 16:12:40
Message-ID:
751261b20910080912md3176e8ga671c768b4fee055@mail.gmail.com (view raw or flat )
Thread:
2009-09-10 14:33:48 from Emmanuel Cecchet <manu(at)asterdata(dot)com>
2009-09-10 15:29:07 from David Fetter <david(at)fetter(dot)org>
2009-09-10 16:06:36 from Emmanuel Cecchet <manu(at)asterdata(dot)com>
2009-09-10 21:05:28 from Josh Berkus <josh(at)agliodbs(dot)com>
2009-09-10 21:32:12 from Emmanuel Cecchet <manu(at)asterdata(dot)com>
2009-09-10 22:11:03 from Josh Berkus <josh(at)agliodbs(dot)com>
2009-09-10 22:34:36 from Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
2009-09-10 23:28:57 from David Fetter <david(at)fetter(dot)org>
2009-09-10 23:29:27 from Josh Berkus <josh(at)agliodbs(dot)com>
2009-09-10 23:58:01 from Emmanuel Cecchet <manu(at)asterdata(dot)com>
2009-09-11 02:53:27 from Robert Haas <robertmhaas(at)gmail(dot)com>
2009-09-11 13:16:19 from Emmanuel Cecchet <manu(at)asterdata(dot)com>
2009-09-11 14:18:51 from Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
2009-09-11 14:53:33 from Emmanuel Cecchet <manu(at)asterdata(dot)com>
2009-09-11 15:04:06 from Robert Haas <robertmhaas(at)gmail(dot)com>
2009-09-11 15:13:29 from Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
2009-09-11 18:55:43 from Emmanuel Cecchet <manu(at)asterdata(dot)com>
2009-09-11 19:08:31 from Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
2009-09-11 22:56:42 from Emmanuel Cecchet <manu(at)asterdata(dot)com>
2009-09-11 23:04:30 from Josh Berkus <josh(at)agliodbs(dot)com>
2009-09-11 23:04:48 from Robert Haas <robertmhaas(at)gmail(dot)com>
2009-09-11 15:11:47 from Robert Haas <robertmhaas(at)gmail(dot)com>
2009-09-11 15:26:16 from Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
2009-09-11 15:33:38 from Robert Haas <robertmhaas(at)gmail(dot)com>
2009-09-11 15:37:33 from Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
2009-09-11 16:09:26 from Robert Haas <robertmhaas(at)gmail(dot)com>
2009-09-11 16:28:03 from Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
2009-09-11 19:44:52 from Robert Haas <robertmhaas(at)gmail(dot)com>
2009-09-11 20:02:33 from Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
2009-09-11 21:07:15 from Robert Haas <robertmhaas(at)gmail(dot)com>
2009-09-11 21:32:42 from Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
2009-09-11 21:45:07 from Robert Haas <robertmhaas(at)gmail(dot)com>
2009-09-11 21:52:50 from Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
2009-09-11 21:27:19 from Greg Smith <gsmith(at)gregsmith(dot)com>
2009-09-11 22:04:06 from Josh Berkus <josh(at)agliodbs(dot)com>
2009-09-11 22:13:17 from Andrew Dunstan <andrew(at)dunslane(dot)net>
2009-09-12 07:07:13 from Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
2009-09-12 07:12:23 from Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
2009-09-12 08:22:01 from Greg Smith <gsmith(at)gregsmith(dot)com>
2009-09-12 15:13:39 from Andrew Dunstan <andrew(at)dunslane(dot)net>
2009-09-12 15:23:57 from Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
2009-09-12 15:44:01 from Andrew Dunstan <andrew(at)dunslane(dot)net>
2009-09-12 15:57:00 from Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
2009-09-12 21:51:49 from Greg Smith <gsmith(at)gregsmith(dot)com>
2009-09-13 21:17:51 from Josh Berkus <josh(at)agliodbs(dot)com>
2009-09-13 21:33:59 from Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
2009-09-14 00:17:45 from Andrew Dunstan <andrew(at)dunslane(dot)net>
2009-09-19 19:55:28 from Bruce Momjian <bruce(at)momjian(dot)us>
2009-09-11 22:35:46 from Emmanuel Cecchet <manu(at)asterdata(dot)com>
2009-09-12 08:43:46 from Greg Smith <gsmith(at)gregsmith(dot)com>
2009-09-14 18:41:33 from Emmanuel Cecchet <manu(at)asterdata(dot)com>
2009-09-14 19:37:53 from Andrew Dunstan <andrew(at)dunslane(dot)net>
2009-09-11 17:42:05 from David Fetter <david(at)fetter(dot)org>
2009-09-11 17:50:35 from Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
2009-09-11 15:38:00 from Andrew Dunstan <andrew(at)dunslane(dot)net>
2009-09-11 15:32:40 from Pierre Frédéric Caillaud <lists(at)peufeu(dot)com>
2009-09-11 15:34:04 from Robert Haas <robertmhaas(at)gmail(dot)com>
2009-09-11 15:49:21 from Andrew Dunstan <andrew(at)dunslane(dot)net>
2009-09-11 20:56:11 from Stephen Frost <sfrost(at)snowman(dot)net>
2009-09-11 21:21:12 from Robert Haas <robertmhaas(at)gmail(dot)com>
2009-09-11 22:36:56 from Stephen Frost <sfrost(at)snowman(dot)net>
2009-09-10 23:33:32 from Emmanuel Cecchet <manu(at)asterdata(dot)com>
2009-09-10 23:44:16 from Josh Berkus <josh(at)agliodbs(dot)com>
2009-09-10 23:54:47 from Emmanuel Cecchet <manu(at)asterdata(dot)com>
2009-09-10 23:55:36 from Alexander Sennhauser <alex(at)asterdata(dot)com>
2009-09-18 04:14:18 from Emmanuel Cecchet <manu(at)asterdata(dot)com>
2009-09-25 02:38:46 from Robert Haas <robertmhaas(at)gmail(dot)com>
2009-09-25 03:00:35 from Emmanuel Cecchet <manu(at)asterdata(dot)com>
2009-09-25 14:01:27 from Emmanuel Cecchet <manu(at)asterdata(dot)com>
2009-10-04 17:43:18 from Jeff Davis <pgsql(at)j-davis(dot)com>
2009-10-05 02:08:11 from Emmanuel Cecchet <manu(at)asterdata(dot)com>
2009-10-04 21:32:15 from Selena Deckelmann <selenamarie(at)gmail(dot)com>
2009-10-05 15:30:48 from Emmanuel Cecchet <manu(at)asterdata(dot)com>
2009-10-05 17:01:20 from Josh Berkus <josh(at)agliodbs(dot)com>
2009-10-07 02:07:31 from Emmanuel Cecchet <manu(at)frogthinker(dot)org>
2009-10-07 07:17:20 from Greg Smith <gsmith(at)gregsmith(dot)com>
2009-10-07 10:37:54 from Simon Riggs <simon(at)2ndQuadrant(dot)com>
2009-10-07 13:33:01 from Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
2009-10-07 13:56:41 from Simon Riggs <simon(at)2ndQuadrant(dot)com>
2009-10-08 22:23:03 from Bruce Momjian <bruce(at)momjian(dot)us>
2009-10-08 22:35:05 from Andrew Dunstan <andrew(at)dunslane(dot)net>
2009-10-08 23:30:50 from Simon Riggs <simon(at)2ndQuadrant(dot)com>
2009-10-07 11:07:36 from Robert Haas <robertmhaas(at)gmail(dot)com>
2009-10-07 13:12:44 from Emmanuel Cecchet <manu(at)asterdata(dot)com>
2009-10-07 13:28:02 from Andrew Dunstan <andrew(at)dunslane(dot)net>
2009-10-07 14:29:08 from Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
2009-10-07 14:45:45 from Andrew Dunstan <andrew(at)dunslane(dot)net>
2009-10-07 15:43:07 from Emmanuel Cecchet <manu(at)asterdata(dot)com>
2009-10-07 14:15:31 from Robert Haas <robertmhaas(at)gmail(dot)com>
2009-10-07 15:39:37 from Emmanuel Cecchet <manu(at)asterdata(dot)com>
2009-10-07 15:42:11 from Robert Haas <robertmhaas(at)gmail(dot)com>
2009-10-07 15:45:11 from Emmanuel Cecchet <manu(at)asterdata(dot)com>
2009-10-07 16:25:28 from Robert Haas <robertmhaas(at)gmail(dot)com>
2009-10-07 23:45:38 from Greg Smith <gsmith(at)gregsmith(dot)com>
2009-10-07 23:52:15 from Greg Smith <gsmith(at)gregsmith(dot)com>
2009-10-08 01:07:50 from Robert Haas <robertmhaas(at)gmail(dot)com>
2009-10-08 15:29:01 from Alvaro Herrera <alvherre(at)commandprompt(dot)com>
2009-10-08 15:33:49 from Robert Haas <robertmhaas(at)gmail(dot)com>
2009-10-08 22:48:00 from Bruce Momjian <bruce(at)momjian(dot)us>
2009-10-07 15:50:05 from Emmanuel Cecchet <manu(at)asterdata(dot)com>
2009-10-08 02:30:57 from Robert Haas <robertmhaas(at)gmail(dot)com>
2009-10-08 08:42:09 from Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
2009-10-08 10:43:30 from Robert Haas <robertmhaas(at)gmail(dot)com>
2009-10-08 12:34:33 from Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
2009-10-08 13:42:28 from Robert Haas <robertmhaas(at)gmail(dot)com>
2009-10-08 15:01:53 from Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
2009-10-08 15:32:17 from Robert Haas <robertmhaas(at)gmail(dot)com>
2009-10-08 15:50:19 from Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
2009-10-08 15:59:19 from Robert Haas <robertmhaas(at)gmail(dot)com>
2009-10-08 16:16:35 from "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
2009-10-08 16:37:11 from Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
2009-10-08 16:46:43 from Robert Haas <robertmhaas(at)gmail(dot)com>
2009-10-08 16:49:33 from "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
2009-10-08 16:21:00 from Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
2009-10-08 16:50:41 from Robert Haas <robertmhaas(at)gmail(dot)com>
2009-10-08 17:26:10 from Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
2009-10-08 18:43:51 from Robert Haas <robertmhaas(at)gmail(dot)com>
2009-10-08 23:15:14 from Simon Riggs <simon(at)2ndQuadrant(dot)com>
2009-10-09 06:29:04 from Simon Riggs <simon(at)2ndQuadrant(dot)com>
2009-10-09 13:44:42 from Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
2009-10-09 13:37:34 from Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
2009-10-08 17:09:34 from Greg Smith <gsmith(at)gregsmith(dot)com>
2009-10-12 13:06:33 from Simon Riggs <simon(at)2ndQuadrant(dot)com>
2009-11-10 01:01:51 from Andrew Dunstan <andrew(at)dunslane(dot)net>
2009-11-10 03:03:22 from Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
2009-10-13 13:57:44 from Emmanuel Cecchet <manu(at)frogthinker(dot)org>
2009-10-13 14:35:15 from Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
2009-10-13 16:34:31 from Emmanuel Cecchet <manu(at)asterdata(dot)com>
2009-10-13 17:10:05 from Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
2009-10-19 04:40:30 from Gokulakannan Somasundaram <gokul007(at)gmail(dot)com>
2009-10-19 15:21:48 from Alvaro Herrera <alvherre(at)commandprompt(dot)com>
2009-10-19 15:34:30 from Robert Haas <robertmhaas(at)gmail(dot)com>
2009-10-20 19:00:47 from Emmanuel Cecchet <manu(at)asterdata(dot)com>
2009-10-20 19:17:13 from Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
2009-10-13 16:30:27 from Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
2009-10-08 16:12:40 from Rod Taylor <rod(dot)taylor(at)gmail(dot)com>
2009-10-08 17:19:03 from Greg Smith <gsmith(at)gregsmith(dot)com>
2009-10-09 09:10:55 from Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
2009-10-09 13:52:18 from Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
2009-10-09 15:42:19 from Greg Smith <gsmith(at)gregsmith(dot)com>
2009-10-12 13:13:15 from Simon Riggs <simon(at)2ndQuadrant(dot)com>
2009-10-08 12:40:36 from Simon Riggs <simon(at)2ndQuadrant(dot)com>
2009-10-08 14:22:29 from "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
2009-10-08 22:38:40 from Bruce Momjian <bruce(at)momjian(dot)us>
Lists:
pgsql-hackers
> Yeah. I think it's going to be hard to make this work without having
> standalone transactions. One idea would be to start a subtransaction,
> insert tuples until one fails, then rollback the subtransaction and
> start a new one, and continue on until the error limit is reached.
>
I've found performance is reasonable, for data with low numbers of errors
(say 1 per 100,000 records or less) doing the following:
SAVEPOINT bulk;
Insert 1000 records using COPY.
If there is an error, rollback to bulk, and step through each line
individually within its own "individual" subtransaction. All good lines are
kept and bad lines are logged; client side control makes logging trivial.
The next set of 1000 records is done in bulk again.
1000 records per savepoint seems to be a good point for my data without too
much time lost to overhead or too many records to retry due to a failing
record. Of course, it is controlled by the client side rather than server
side so reporting back broken records is trivial.
It may be possible to boost performance by:
1) Having copy remember which specific line caused the error. So it can
replace lines 1 through 487 in a subtransaction since it knows those are
successful. Run 488 in its on subtransaction. Run 489 through ... in a new
subtransaction.
2) Increasing the number of records per subtransaction if data is clean. It
wouldn't take long until you were inserting millions of records per
subtransaction for a large data set. This should make the subtransaction
overhead minimal. Small imports would still run slower but very large
imports of clean data should be essentially the same speed in the end.
In response to
Responses
pgsql-hackers by date
Next :From: David FetterDate: 2009-10-08 16:15:47
Subject : Re: Writeable CTEs and side effects
Previous :From : Dan ColishDate : 2009-10-08 16:03:48
Subject : one line comment style