Skip site navigation (1) Skip section navigation (2)

Performance problems with multiple layers of functions

From: Svenne Krap <svenne(at)krap(dot)dk>
To: pgsql-performance(at)postgresql(dot)org
Subject: Performance problems with multiple layers of functions
Date: 2006-03-24 12:49:17
Message-ID: 4423EACD.80706@krap.dk (view raw or flat)
Thread:
Lists: pgsql-performance
Hi there.

I have hit a edge in the planning and I hope you can help.

The system uses a lot of stored procedures to move as much of the 
intelligence into the database layer as possible.

My (development) query looks like and runs reasonably fast:

explain analyze select dataset_id, entity, sum(amount) from 
entrydata_current where  flow_direction in (select * from 
outflow_direction(dataset_id)) and dataset_id in (122,112,125,89,111) 
group by dataset_id, entity;
                                                                           
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=918171.00..918171.30 rows=24 width=19) (actual 
time=11533.297..11533.340 rows=50 loops=1)
   ->  Bitmap Heap Scan on entrydata_current  (cost=676.72..917736.04 
rows=57994 width=19) (actual time=23.921..11425.373 rows=37870 loops=1)
         Recheck Cond: ((dataset_id = 122) OR (dataset_id = 112) OR 
(dataset_id = 125) OR (dataset_id = 89) OR (dataset_id = 111))
         Filter: (subplan)
         ->  BitmapOr  (cost=676.72..676.72 rows=117633 width=0) (actual 
time=15.765..15.765 rows=0 loops=1)
               ->  Bitmap Index Scan on entrydata_current_dataset_idx  
(cost=0.00..83.97 rows=14563 width=0) (actual time=1.881..1.881 
rows=13728 loops=1)
                     Index Cond: (dataset_id = 122)
               ->  Bitmap Index Scan on entrydata_current_dataset_idx  
(cost=0.00..156.12 rows=27176 width=0) (actual time=3.508..3.508 
rows=25748 loops=1)
                     Index Cond: (dataset_id = 112)
               ->  Bitmap Index Scan on entrydata_current_dataset_idx  
(cost=0.00..124.24 rows=21498 width=0) (actual time=2.729..2.729 
rows=20114 loops=1)
                     Index Cond: (dataset_id = 125)
               ->  Bitmap Index Scan on entrydata_current_dataset_idx  
(cost=0.00..102.20 rows=17771 width=0) (actual time=2.351..2.351 
rows=17344 loops=1)
                     Index Cond: (dataset_id = 89)
               ->  Bitmap Index Scan on entrydata_current_dataset_idx  
(cost=0.00..210.19 rows=36625 width=0) (actual time=5.292..5.292 
rows=37118 loops=1)
                     Index Cond: (dataset_id = 111)
         SubPlan
           ->  Function Scan on outflow_direction  (cost=0.00..12.50 
rows=1000 width=4) (actual time=0.093..0.095 rows=4 loops=114052)
 Total runtime: 11540.506 ms
(18 rows)

The problem is, that the application should not need to know the five 
dataset_ids (it will always know one - its own). So I make a function to 
return the five ids and then the query looks like:

explain select dataset_id, entity, sum(amount) from entrydata_current 
where  flow_direction in (select * from outflow_direction(dataset_id)) 
and dataset_id in (select * from get_dataset_ids(122)) group by 
dataset_id, entity;
                                                               QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=24672195.68..24672203.88 rows=656 width=19)
   ->  Hash IN Join  (cost=15.00..24660005.45 rows=1625364 width=19)
         Hash Cond: ("outer".dataset_id = "inner".get_dataset_ids)
         ->  Index Scan using entrydata_current_dataset_idx on 
entrydata_current  (cost=0.00..24558405.20 rows=1625364 width=19)
               Filter: (subplan)
               SubPlan
                 ->  Function Scan on outflow_direction  
(cost=0.00..12.50 rows=1000 width=4)
         ->  Hash  (cost=12.50..12.50 rows=1000 width=4)
               ->  Function Scan on get_dataset_ids  (cost=0.00..12.50 
rows=1000 width=4)
(9 rows)

which does not return within 10 minutes - which is unacceptable.

Is there any way to get a better plan for the second ? The planner 
should really see the two queries as equal as there is no dependencies 
between the outer query and get_dataset_ids (isn't it called constant 
folding?).

Thanks in advance

Svenne

Responses

pgsql-performance by date

Next:From: Jim C. NasbyDate: 2006-03-24 12:52:45
Subject: Re: Array performance
Previous:From: Ruben Rubio ReyDate: 2006-03-24 12:41:50
Subject: Array performance

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group