Skip site navigation (1) Skip section navigation (2)

Performance hit on large row counts

From: David Scott <davids(at)apptechsys(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Performance hit on large row counts
Date: 2005-12-26 21:03:34
Message-ID: 43B05AA6.9020902@apptechsys.com (view raw or flat)
Thread:
Lists: pgsql-performance
We are trying to ascertain if we are up against the limits of what 
postgres can accomplish without having the tables clustered.  We would 
prefer not to have to cluster our tables because according to the 
documentation this is a one time operation and not maintained.  Is there 
any other performance tweaks that can be done to avoid clustering?  Is 
there any way to force the cluster maintenance (even with a performance 
hit on load)? 
We are aware that there is a minimum time that is required to resolve 
the index values against the table to ascertain that they are live rows, 
and we believe we are circumventing that time to some extent by taking 
advantage of the rows being in physical order with the cluster.  So does 
this lead us to the conclusion that the differences in the query times 
is how long is takes us to check on disk whether or not these rows are live?

Thanks for any help, thoughts, tips or suggestions.


All of these commands are after a vacuum full analyze and the config 
file is attached.  Different values were used for the queries so no 
caching would confuse our stats.  The box is running gentoo with 
postgres 8.1.0, has raid 0, 9 gigs of ram, 2 hyperthreaded procs, x86_64.

/Three tables with row counts:/
    lookup1.count =  3,306,930
    lookup2.count = 4,189,734
    stuff.count = 3,423,994

/The first attempt (after index adjustments, no hits to cached results)/

explain analyze select col2, count(*) as cnt from stuff where col1 = 
56984 group by col2

 HashAggregate  (cost=14605.68..14605.88 rows=16 width=4) (actual 
time=6980.752..6985.893 rows=6389 loops=1)
   ->  Bitmap Heap Scan on stuff  (cost=60.97..14571.44 rows=6848 
width=4) (actual time=371.215..6965.742 rows=6389 loops=1)
         Recheck Cond: (col1 = 56984)
         ->  Bitmap Index Scan on stuff_pair_idx  (cost=0.00..60.97 
rows=6848 width=0) (actual time=361.237..361.237 rows=6389 loops=1)
               Index Cond: (col1 = 56984)
 Total runtime: 6988.105 ms

/After clustering:/

explain analyze select col2, count(*) as cnt from stuff where col1 = 
3540634 group by col2;
                                                            QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=1399.62..1399.63 rows=1 width=4) (actual 
time=11.376..15.282 rows=5587 loops=1)
   ->  Bitmap Heap Scan on stuff  (cost=4.36..1397.68 rows=389 width=4) 
(actual time=1.029..4.538 rows=5587 loops=1)
         Recheck Cond: (col1 = 3540634)
         ->  Bitmap Index Scan on stuff_col1_idx  (cost=0.00..4.36 
rows=389 width=0) (actual time=1.003..1.003 rows=5587 loops=1)
               Index Cond: (col1 = 3540634)
 Total runtime: 17.113 ms



/Using this in the next layer of querying:/

explain analyze  SELECT col1,col2, value AS val,  
coalesce(coalesce(lookup1.col3, lookup2.col3),0) AS dollars FROM (select 
col1, col2, value from stuff where col1 = 95350) stuff LEFT JOIN  
lookup1 ON (stuff.col2 = lookup1.pkey) LEFT JOIN lookup2 ON (stuff.col2 
= lookup2.pkey);

 Nested Loop Left Join  (cost=0.00..10325.15 rows=857 width=20) (actual 
time=84.223..9306.228 rows=2296 loops=1)
   ->  Nested Loop Left Join  (cost=0.00..5183.25 rows=857 width=16) 
(actual time=56.623..1710.655 rows=2296 loops=1)
         ->  Index Scan using stuff_col1_idx on stuff  (cost=0.00..21.57 
rows=857 width=12) (actual time=40.531..57.160 rows=2296 loops=1)
               Index Cond: (col1 = 4528383)
         ->  Index Scan using lookup2_pkey on lookup2  (cost=0.00..6.01 
rows=1 width=8) (actual time=0.717..0.717 rows=0 loops=2296)
               Index Cond: ("outer".col2 = lookup2.pkey)
   ->  Index Scan using lookup1_pkey on lookup1  (cost=0.00..5.99 rows=1 
width=8) (actual time=3.304..3.305 rows=1 loops=2296)
         Index Cond: ("outer".col2 = lookup1.pkey)
 Total runtime: 9307.569 ms

/After clustering the two left join tables (lookup1 and lookup2):/

explain analyze  SELECT col1,col2, value AS val,  
coalesce(coalesce(lookup1.col3, lookup2.col3),0) AS dollars FROM (select 
col1, col2, value from stuff where col1 = 95350) stuff LEFT JOIN  
lookup1 ON (stuff.col2 = lookup1.pkey) LEFT JOIN lookup2 ON (stuff.col2 
= lookup2.pkey);

 Nested Loop Left Join  (cost=0.00..10325.15 rows=857 width=20) (actual 
time=24.444..84.114 rows=1727 loops=1)
   ->  Nested Loop Left Join  (cost=0.00..5163.47 rows=857 width=16) 
(actual time=24.392..62.787 rows=1727 loops=1)
         ->  Index Scan using stuff_col1_idx on stuff  (cost=0.00..21.57 
rows=857 width=12) (actual time=24.332..27.455 rows=1727 loops=1)
               Index Cond: (col1 = 95350)
         ->  Index Scan using lookup1_pkey on lookup1  (cost=0.00..5.99 
rows=1 width=8) (actual time=0.018..0.018 rows=1 loops=1727)
               Index Cond: ("outer".col2 = lookup1.pkey)
   ->  Index Scan using lookup2_pkey on lookup2  (cost=0.00..6.01 rows=1 
width=8) (actual time=0.010..0.010 rows=0 loops=1727)
         Index Cond: ("outer".col2 = lookup2.pkey)
 Total runtime: 84.860 ms



Attachment: postgresql.conf
Description: text/plain (13.3 KB)

Responses

pgsql-performance by date

Next:From: Ivan VorasDate: 2005-12-26 21:32:13
Subject: Bitmap indexes etc.
Previous:From: Luke LonerganDate: 2005-12-26 18:50:34
Subject: Re: What's the best hardver for PostgreSQL 8.1?

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group