Skip site navigation (1) Skip section navigation (2)

Re: optimising data load

From: "Patrick Hatcher" <PHatcher(at)macys(dot)com>
To: "John Taylor <postgres" <postgres(at)jtresponse(dot)co(dot)uk>
Cc: pgsql-novice(at)postgresql(dot)org, pgsql-novice-owner(at)postgresql(dot)org
Subject: Re: optimising data load
Date: 2002-05-22 15:29:07
Message-ID: OF90D8E83C.A706D781-ON88256BC1.0054A6F0@fds.com (view raw or flat)
Thread:
Lists: pgsql-novice
Dump the records from the other dbase to a text file and then use the COPY
command for Pg.  I update tables nightly with 400K+ records and it only
takes 1 -2 mins.  You should drop and re-add your indexes and then do a
vacuum analyze

Patrick Hatcher
Macys.Com
Legacy Integration Developer
415-932-0610 office
HatcherPT - AIM




                                                                                                                     
                    John Taylor                                                                                      
                    <postgres(at)jtresponse(dot)co       To:     pgsql-novice(at)postgresql(dot)org                                
                    .uk>                          cc:                                                                
                    Sent by:                      Subject:     [NOVICE] optimising data load                         
                    pgsql-novice-owner(at)post                                                                          
                    gresql.org                                                                                       
                                                                                                                     
                                                                                                                     
                    05/22/2002 06:45 AM                                                                              
                                                                                                                     
                                                                                                                     





Hi all,

I'm (still) working on an application to regularly populate my database
with some data provided from
a third party DB.

I'm still having really bad performance problems.
There are 300,000 records to be inserted, but I'm only getting 10,000/hour.
30 hours to populate a single table is just not on. There must be something
seriously bad going on.

I have to update to different databases: live, and an update log.

I've included explains for both the db's below.

I have a few questions:

1) Can it really be right that it is going this slow ?
   I'm running Redhat Linux with kernel 2.4.7, and postgres 7.1.3

2) Can anyone see a way to optimise these queries further ?

3) Is it likely to be quicker if I use a query to obtain ol.line, and then
generate a CSV file for use with COPY ?

Thanks
JohnT

--- LIVE ---
explain INSERT INTO orderlines
(theorder,type,stock,line,ordercurrent,sellingquant,price,discount,vatrate,comment)

SELECT oh.theorder,'P','  0310',coalesce(ol.line+1,1),5,0,.52,0,0,''  FROM
orderheader oh
LEFT OUTER JOIN orderlines ol ON oh.theorder = ol.theorder
WHERE oh.account=' MILN1' AND oh.delivery=1 AND oh.thedate='2002-06-01' AND
oh.ordertype='O'
ORDER BY ol.line DESC LIMIT 1;
NOTICE:  QUERY PLAN:

Subquery Scan *SELECT*  (cost=47.41..47.41 rows=1 width=12)
  ->  Limit  (cost=47.41..47.41 rows=1 width=12)
        ->  Sort  (cost=47.41..47.41 rows=1 width=12)
              ->  Nested Loop  (cost=0.00..47.40 rows=1 width=12)
                    ->  Index Scan using orderheader_account on orderheader
oh  (cost=0.00..21.64 rows=1 width=4)
                    ->  Index Scan using orderlines_pkey on orderlines ol
(cost=0.00..25.54 rows=17 width=8)

EXPLAIN

--- UPDATE LOG ---
explain INSERT INTO orderlinesupdates
(theorder,type,stock,line,ordercurrent,sellingquant,price,discount,vatrate,comment,updated,utype,origin)

SELECT oh.theorder,'P','
0310',coalesce(ol.line,ol2.line+1,1),5,0,.52,0,0,'',128,'+','C' FROM
orderheaderupdates oh
LEFT OUTER JOIN orderlinesupdates ol ON oh.theorder = ol.theorder AND
ol.stock='  0310'
LEFT OUTER JOIN orderlinesupdates ol2 ON oh.theorder = ol2.theorder
WHERE oh.account=' MILN1' AND oh.delivery=1 AND oh.thedate='2002-06-01' AND
oh.ordertype='O'
ORDER BY oh.updated DESC, ol.line DESC, ol2.line DESC LIMIT 1;
NOTICE:  QUERY PLAN:

Subquery Scan *SELECT*  (cost=81.29..81.29 rows=1 width=36)
  ->  Limit  (cost=81.29..81.29 rows=1 width=36)
        ->  Sort  (cost=81.29..81.29 rows=1 width=36)
              ->  Nested Loop  (cost=0.00..81.28 rows=1 width=36)
                    ->  Nested Loop  (cost=0.00..52.47 rows=1 width=28)
                          ->  Index Scan using orderheaderupdates_account
on orderheaderupdates oh  (cost=0.00..23.62 rows=1 width=8)
                          ->  Index Scan using orderlinesupdates_theorder
on orderlinesupdates ol
 (cost=0.00..28.60 rows=17 width=20)
                    ->  Index Scan using orderlinesupdates_theorder on
orderlinesupdates ol2  (cost=0.00..28.60 rows=17 width=8)

EXPLAIN


---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
    (send "unregister YourEmailAddressHere" to majordomo(at)postgresql(dot)org)





pgsql-novice by date

Next:From: John TaylorDate: 2002-05-22 15:35:09
Subject: Re: optimising data load
Previous:From: Tom LaneDate: 2002-05-22 15:19:48
Subject: Re: Undead record haunts my database, need exorcism

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group