Skip site navigation (1) Skip section navigation (2)

performance problems with subselects

From: John Taylor <postgres(at)jtresponse(dot)co(dot)uk>
To: pgsql-novice(at)postgresql(dot)org
Subject: performance problems with subselects
Date: 2002-04-23 13:29:34
Message-ID: 02042314293408.01601@splash.hq.jtresponse.co.uk (view raw or flat)
Thread:
Lists: pgsql-novice
Hi,

I have a rather complicated problem, that I hope someone can help with.

My application keeps track of updates to an order database running on another machine.

For reasons that are too complicated to go into, the orderlines are supplied with some of the
values missing, so I have to look them up before inserting into my database.

I'm now at the prototype stage, and I'm having performance problems.
It currently takes 10 hours to insert 44,000 records. This is almost 1 sec/record which is much too slow.
I have run vacuum, but that doesn't seem to make much difference.


The two tables concered are as follows:

                                      Table "orderheaderupdates"
  Attribute   |          Type          |                           Modifier
--------------+------------------------+--------------------------------------------------------------
 account      | character varying(6)   |
 delivery     | smallint               |
 thedate      | date                   |
 theorder     | integer                | not null default nextval('"orderheader_theorder_seq"'::text)
 deliverynote | character varying(10)  |
 invoicenote  | character varying(10)  |
 grnpod       | character varying(10)  |
 comments     | character varying(240) |
 dcomments    | character varying(100) |
 round        | character varying(3)   |
 tempno       | smallint               |
 status       | character(1)           |
 ordertype    | character(1)           |
 transno      | integer                |
 updated      | integer                |
 utype        | character(1)           |
 origin       | character(1)           |
Indices: orderheaderupdates_account,
         orderheaderupdates_delivery,
         orderheaderupdates_ordertype,
         orderheaderupdates_origin,
         orderheaderupdates_thedate,
         orderheaderupdates_theorder,
         orderheaderupdates_updated            


            Table "orderlinesupdates"
  Attribute   |          Type          | Modifier
--------------+------------------------+----------
 theorder     | integer                | not null
 type         | character(1)           |
 stock        | character varying(6)   |
 line         | integer                | not null
 ordercurrent | integer                |
 sellingquant | integer                |
 price        | numeric(7,2)           |
 discount     | numeric(5,2)           |
 vatrate      | smallint               |
 comment      | character varying(100) |
 updated      | integer                |
 utype        | character(1)           |
 origin       | character(1)           |
Indices: orderlinesupdates_line,
         orderlinesupdates_origin,
         orderlinesupdates_stock,
         orderlinesupdates_theorder,
         orderlinesupdates_updated


To insert into the orderlines tables I am currently performing the following:

INSERT INTO orderlinesupdates
   (theorder,type,stock,line,ordercurrent,sellingquant,price,discount,vatrate,comment,updated,utype,origin)
VALUES
   (
     (SELECT theorder FROM orderheaderupdates WHERE account='  1003' AND delivery=1 AND thedate='2002-02-26' AND ordertype='O' ORDER BY updated DESC LIMIT 1),
     'P',
     'ITEM1',
     (select coalesce
             (
                (select line from orderlinesupdates where theorder=(SELECT theorder FROM orderheaderupdates WHERE account='  1003' AND delivery=1 AND thedate='2002-02-26' AND ordertype='O'  ORDER BY updated DESC LIMIT 1) AND stock='ITEM1' limit 1),
                (select max(line)+1 from orderlinesupdates where theorder in (SELECT theorder FROM orderheader WHERE account='  1003' AND delivery=1  AND thedate='2002-02-26' AND ordertype='O' LIMIT 1)),
                1
             )
     ),
     1,
     1,
     1.1,
     0.0,
     0,
     '',
     0,
     '+',
     'C'
   );

The first subselect is to find the correct ordernumber (theorder) from the orderheader.
The second subselect tries to locate the line number, by searching for the orderline in case this line has been inserted previously.
The third subselect finds the next free line number when it is the first time this orderline has been added.
The orderline defaults to 1, if this is the first orderline entered for the matching orderheader.

I am aware that I am using almost the same select from orderheader 3 times, but I'm not sure how I can improve on this.
Would a stored procedure help ?

Running explain on the insert gives the following:


Result  (cost=0.00..0.00 rows=0 width=0)
  InitPlan
    ->  Limit  (cost=2.04..2.04 rows=1 width=8)
          ->  Sort  (cost=2.04..2.04 rows=1 width=8)
                ->  Index Scan using orderheaderupdates_account on orderheaderupdates  (cost=0.00..2.03 rows=1 width=8)
    ->  Result  (cost=0.00..0.00 rows=0 width=0)
          InitPlan
            ->  Limit  (cost=0.00..12.64 rows=1 width=4)
                  InitPlan
                    ->  Limit  (cost=2.04..2.04 rows=1 width=8)
                          ->  Sort  (cost=2.04..2.04 rows=1 width=8)
                                ->  Index Scan using orderheaderupdates_account on orderheaderupdates  (cost=0.00..2.03 rows=1 width=8)
                  ->  Index Scan using orderlinesupdates_theorder on orderlinesupdates  (cost=0.00..12.64 rows=1 width=4)
            ->  Limit  (cost=0.00..12.64 rows=1 width=4)
                  InitPlan
                    ->  Limit  (cost=2.04..2.04 rows=1 width=8)
                          ->  Sort  (cost=2.04..2.04 rows=1 width=8)
                                ->  Index Scan using orderheaderupdates_account on orderheaderupdates  (cost=0.00..2.03 rows=1 width=8)
                  ->  Index Scan using orderlinesupdates_theorder on orderlinesupdates  (cost=0.00..12.64 rows=1 width=4)
            ->  Aggregate  (cost=45959.26..45959.26 rows=1 width=4)
                  ->  Seq Scan on orderlinesupdates  (cost=0.00..45849.26 rows=44000 width=4)
                        SubPlan
                          ->  Materialize  (cost=1.01..1.01 rows=1 width=4)
                                ->  Limit  (cost=0.00..1.01 rows=1 width=4)
                                      ->  Result  (cost=0.00..2.03 rows=2 width=4)
                                            ->  Append  (cost=0.00..2.03 rows=2 width=4)
                                                  ->  Seq Scan on orderheader  (cost=0.00..0.00 rows=1 width=4)
                                                  ->  Index Scan using orderheaderupdates_account on orderheaderupdates orderheader  (cost=0.00..2.03 rows=1 width=4)
            ->  Aggregate  (cost=45959.26..45959.26 rows=1 width=4)
                  ->  Seq Scan on orderlinesupdates  (cost=0.00..45849.26 rows=44000 width=4)
                        SubPlan
                          ->  Materialize  (cost=1.01..1.01 rows=1 width=4)
                                ->  Limit  (cost=0.00..1.01 rows=1 width=4)
                                      ->  Result  (cost=0.00..2.03 rows=2 width=4)
                                            ->  Append  (cost=0.00..2.03 rows=2 width=4)
                                                  ->  Seq Scan on orderheader  (cost=0.00..0.00 rows=1 width=4)
                                                  ->  Index Scan using orderheaderupdates_account on orderheaderupdates orderheader  (cost=0.00..2.03 rows=1 width=4)

Any ideas would be greatly appeciated.

Thanks
JohnT

Responses

pgsql-novice by date

Next:From: Tom LaneDate: 2002-04-23 14:04:49
Subject: Re: performance problems with subselects
Previous:From: Curt SampsonDate: 2002-04-23 02:34:32
Subject: Re: Hardware needed for 15,000,000 record DB?

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group