Re: optimising data load

From: Ron Johnson <ron(dot)l(dot)johnson(at)cox(dot)net>
To: PgSQL Novice ML <pgsql-novice(at)postgresql(dot)org>
Subject: Re: optimising data load
Date: 2002-05-22 17:40:03
Message-ID: 1022089203.19119.26.camel@rebel
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

On Wed, 2002-05-22 at 08:45, John Taylor wrote:
>
> Hi all,
>
> I'm (still) working on an application to regularly populate my database with some data provided from
> a third party DB.
>
> I'm still having really bad performance problems.
> There are 300,000 records to be inserted, but I'm only getting 10,000/hour.
> 30 hours to populate a single table is just not on. There must be something seriously bad going on.

From what I see of your INSERT statement, you are joining orderheader
to orderlines and then inserting those records into orderlines.
Is that correct?

I think I'd make a view from the SELECT portion. Then, nightly:
1. COPY out the view.
2. Drop the index(es) on orderlines
3. COPY into orderlines
4. recreate indexes on orderlines.

Do the same for orderlineupdates. The thing is is that
modifying indexes is a _very_ expensive operation, no matter
what RDBMS or hardware configuration you have...

> I have to update to different databases: live, and an update log.
>
> I've included explains for both the db's below.
>
> I have a few questions:
>
> 1) Can it really be right that it is going this slow ?
> I'm running Redhat Linux with kernel 2.4.7, and postgres 7.1.3
>
> 2) Can anyone see a way to optimise these queries further ?
>
> 3) Is it likely to be quicker if I use a query to obtain ol.line, and then generate a CSV file for use with COPY ?
>
> Thanks
> JohnT
>
> --- LIVE ---
> explain INSERT INTO orderlines (theorder,type,stock,line,ordercurrent,sellingquant,price,discount,vatrate,comment)
> SELECT oh.theorder,'P',' 0310',coalesce(ol.line+1,1),5,0,.52,0,0,'' FROM orderheader oh
> LEFT OUTER JOIN orderlines ol ON oh.theorder = ol.theorder
> WHERE oh.account=' MILN1' AND oh.delivery=1 AND oh.thedate='2002-06-01' AND oh.ordertype='O'
> ORDER BY ol.line DESC LIMIT 1;
> NOTICE: QUERY PLAN:
>
> Subquery Scan *SELECT* (cost=47.41..47.41 rows=1 width=12)
> -> Limit (cost=47.41..47.41 rows=1 width=12)
> -> Sort (cost=47.41..47.41 rows=1 width=12)
> -> Nested Loop (cost=0.00..47.40 rows=1 width=12)
> -> Index Scan using orderheader_account on orderheader oh (cost=0.00..21.64 rows=1 width=4)
> -> Index Scan using orderlines_pkey on orderlines ol (cost=0.00..25.54 rows=17 width=8)
>
> EXPLAIN
>
> --- UPDATE LOG ---
> explain INSERT INTO orderlinesupdates (theorder,type,stock,line,ordercurrent,sellingquant,price,discount,vatrate,comment,updated,utype,origin)
> SELECT oh.theorder,'P',' 0310',coalesce(ol.line,ol2.line+1,1),5,0,.52,0,0,'',128,'+','C' FROM orderheaderupdates oh
> LEFT OUTER JOIN orderlinesupdates ol ON oh.theorder = ol.theorder AND ol.stock=' 0310'
> LEFT OUTER JOIN orderlinesupdates ol2 ON oh.theorder = ol2.theorder
> WHERE oh.account=' MILN1' AND oh.delivery=1 AND oh.thedate='2002-06-01' AND oh.ordertype='O'
> ORDER BY oh.updated DESC, ol.line DESC, ol2.line DESC LIMIT 1;
> NOTICE: QUERY PLAN:
>
> Subquery Scan *SELECT* (cost=81.29..81.29 rows=1 width=36)
> -> Limit (cost=81.29..81.29 rows=1 width=36)
> -> Sort (cost=81.29..81.29 rows=1 width=36)
> -> Nested Loop (cost=0.00..81.28 rows=1 width=36)
> -> Nested Loop (cost=0.00..52.47 rows=1 width=28)
> -> Index Scan using orderheaderupdates_account on orderheaderupdates oh (cost=0.00..23.62 rows=1 width=8)
> -> Index Scan using orderlinesupdates_theorder on orderlinesupdates ol
> (cost=0.00..28.60 rows=17 width=20)
> -> Index Scan using orderlinesupdates_theorder on orderlinesupdates ol2 (cost=0.00..28.60 rows=17 width=8)
>
> EXPLAIN

--
+---------------------------------------------------------+
| Ron Johnson, Jr. Home: ron(dot)l(dot)johnson(at)cox(dot)net |
| Jefferson, LA USA http://ronandheather.dhs.org:81 |
| |
| "I have created a government of whirled peas..." |
| Maharishi Mahesh Yogi, 12-May-2002, |
! CNN, Larry King Live |
+---------------------------------------------------------+

In response to

Responses

Browse pgsql-novice by date

  From Date Subject
Next Message Ron Johnson 2002-05-22 17:48:58 Re: Better way to bulk-load millions of CSV records into
Previous Message Ron Johnson 2002-05-22 17:18:02 Re: Better way to bulk-load millions of CSV records into