Re: Improve BULK insertion

From: Patrick Hatcher <PHatcher(at)macys(dot)com>
To: "Christopher Browne <cbbrowne" <cbbrowne(at)acm(dot)org>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Improve BULK insertion
Date: 2004-12-04 14:47:17
Message-ID: OFB8DE37C9.38209B63-ON88256F60.00513B9D-88256F60.00513BD6@fds.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

<FONT face="Default Sans Serif, Verdana, Arial, Helvetica, sans-serif" size=2><DIV>I do mass inserts daily into PG.&nbsp; I drop the all indexes except my primary key and then use the COPY FROM command.&nbsp; This usually takes less than 30 seconds.&nbsp; I spend more time waiting for indexes to recreate.<DIV><br><br>Patrick&nbsp;Hatcher<br>Macys.Com<br><DIV>&nbsp;</DIV><FONT color=#990099>-----pgsql-performance-owner(at)postgresql(dot)org wrote: -----<BR><BR></FONT>To: pgsql-performance(at)postgresql(dot)org<BR>From: Christopher Browne &lt;cbbrowne(at)acm(dot)org&gt;<BR>Sent by: pgsql-performance-owner(at)postgresql(dot)org<BR>Date: 2004-12-04 06:48AM<BR>Subject: Re: [PERFORM] Improve BULK insertion<BR><BR><font face="monospace" size=2>In the last exciting episode, grupos(at)carvalhaes(dot)net (Grupos) wrote:<BR>&gt; Hi !<BR>&gt;<BR>&gt; I need to insert 500.000 records on a table frequently. It´s a bulk<BR>&gt; insertion from my applicatoin.<BR>&gt; I am with a very poor performance. PostgreSQL insert very fast until<BR>&gt; the tuple 200.000 and after it the insertion starts to be really slow.<BR>&gt; I am seeing on the log and there is a lot of transaction logs,<BR>&gt; something like :<BR>&gt;<BR>&gt; 2004-12-04 11:08:59 LOG: &nbsp;recycled transaction log file "0000000600000012"<BR>&gt; 2004-12-04 11:08:59 LOG: &nbsp;recycled transaction log file "0000000600000013"<BR>&gt; 2004-12-04 11:08:59 LOG: &nbsp;recycled transaction log file "0000000600000011"<BR>&gt; 2004-12-04 11:14:04 LOG: &nbsp;recycled transaction log file "0000000600000015"<BR>&gt; 2004-12-04 11:14:04 LOG: &nbsp;recycled transaction log file "0000000600000014"<BR>&gt; 2004-12-04 11:19:08 LOG: &nbsp;recycled transaction log file "0000000600000016"<BR>&gt; 2004-12-04 11:19:08 LOG: &nbsp;recycled transaction log file "0000000600000017"<BR>&gt; 2004-12-04 11:24:10 LOG: &nbsp;recycled transaction log file "0000000600000018"<BR><BR>It is entirely normal for there to be a lot of transaction log file<BR>recycling when bulk inserts are taking place; that goes through a lot<BR>of transaction logs.<BR><BR>&gt; How can I configure PostgreSQL to have a better performance on this<BR>&gt; bulk insertions ? I already increased the memory values.<BR><BR>Memory is, as likely as not, NOT the issue.<BR><BR>Two questions:<BR><BR> 1. &nbsp;How are you doing the inserts? &nbsp;Via INSERT statements? &nbsp;Or<BR> &nbsp; &nbsp; via COPY statements? &nbsp;What sort of transaction grouping<BR> &nbsp; &nbsp; is involved?<BR><BR> &nbsp; &nbsp; COPY is way faster than INSERT, and grouping plenty of updates<BR> &nbsp; &nbsp; into a single transaction is generally a "win."<BR><BR> 2. &nbsp;What is the schema like? &nbsp;Does the table have a foreign key<BR> &nbsp; &nbsp; constraint? &nbsp;Does it have a bunch of indices?<BR><BR> &nbsp; &nbsp; If there should eventually be lots of indices, it tends to be<BR> &nbsp; &nbsp; faster to create the table with none/minimal indices, and add<BR> &nbsp; &nbsp; indexes afterwards, as long as your "load" process can be trusted<BR> &nbsp; &nbsp; to not break "unique" constraints...<BR><BR> &nbsp; &nbsp; If there is some secondary table with a foreign key constraint,<BR> &nbsp; &nbsp; and _that_ table is growing, it is possible that a sequential<BR> &nbsp; &nbsp; scan is being used to search the secondary table where, if you<BR> &nbsp; &nbsp; did an ANALYZE on that table, an index scan would be preferred<BR> &nbsp; &nbsp; once it grew to larger size...<BR><BR>There isn't a particular reason for PostgreSQL to "hit a wall" upon<BR>seeing 200K records; I and coworkers routinely load database dumps<BR>that have millions of (sometimes pretty fat) records, and they don't<BR>"choke." &nbsp;That's true whether talking about loading things onto my<BR>(somewhat wimpy) desktop PC, or a SMP Xeon system with a small RAID<BR>array, or higher end stuff involving high end SMP and EMC disk arrays.<BR>The latter obviously being orders of magnitude faster than desktop<BR>equipment :-).<BR>-- <BR>(format nil "~S(at)~S" "cbbrowne" "acm.org")<BR><a href="http://www3.sympatico.ca/cbbrowne/unix.html" target=blank>http://www3.sympatico.ca/cbbrowne/unix.html</a><BR>Rules of the &nbsp;Evil Overlord #207. "Employees will &nbsp;have conjugal visit<BR>trailers which &nbsp;they may use provided &nbsp;they call in &nbsp;a replacement and<BR>sign out on &nbsp;the timesheet. Given this, anyone caught &nbsp;making out in a<BR>closet &nbsp;while &nbsp;leaving &nbsp;their &nbsp; station &nbsp;unmonitored &nbsp;will &nbsp;be &nbsp;shot."<BR>&lt;<a href="http://www.eviloverlord.com/" target=blank>http://www.eviloverlord.com/</a>&gt;<BR><BR>---------------------------(end of broadcast)---------------------------<BR>TIP 7: don't forget to increase your free space map settings<BR></font></DIV></DIV></FONT>

Attachment Content-Type Size
unknown_filename text/html 4.7 KB

Browse pgsql-performance by date

  From Date Subject
Next Message Christopher Browne 2004-12-04 14:48:15 Re: Improve BULK insertion
Previous Message Grupos 2004-12-04 13:39:39 Improve BULK insertion