Skip site navigation (1) Skip section navigation (2)

Various performance questions

From: Dror Matalon <dror(at)zapatec(dot)com>
To: Postgresql Performance <pgsql-performance(at)postgresql(dot)org>
Subject: Various performance questions
Date: 2003-10-26 19:44:50
Message-ID: 20031026194449.GD2979@rlx11.zapatec.com (view raw or flat)
Thread:
Lists: pgsql-performance
Hi,

We're in the process of setting up a new database server. The
application is an online rss aggregator which you can see at
www.fastbuzz.com (still running with the old hardware).

The new machine is a dual Xeon with 2 Gigs of ram 

The OS is freebsd 4.9. 

shared_buffers = 10000
sort_mem = 32768
effective_cache_size = 25520  -- freebsd forumla: vfs.hibufspace / 8192

1. While it seems to work correctly, I'm unclear on why this number is
correct.  25520*8 = 204160 or 200 Megs. On a machine with 2 Gigs it
seems like the number should be more like 1 - 1.5 Gigs.

2.  The main performance challenges are with the items table which has around
five million rows and grows at the rate of more than 100,000 rows a day.

If I do a select count(*) from the items table it take 55 - 60 seconds
to execute. I find it interesting that it takes that long whether it's
doing it the first time and fetching the pages from disk or on
subsequent request where it fetches the pages from memory.
I know that it's not touching the disks because I'm running an iostat in
a different window. Here's the explain analyze:

explain analyze select count(*) from items;
                                                        QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=245377.53..245377.53 rows=1 width=0) (actual time=55246.035..55246.040 rows=1 loops=1)
   ->  Seq Scan on items  (cost=0.00..233100.62 rows=4910762 width=0)
(actual time=0.054..30220.641 rows=4910762 loops=1)
 Total runtime: 55246.129 ms
(3 rows)

and the number of pages:

select relpages from pg_class where relname = 'items';
 relpages
----------
   183993


So does it make sense that it would take close to a minute to count the 5 million rows
even if all pages are in memory? 

3. Relpages is 183993 so file size should be  183993*8192 = 1507270656,
roughly 1.5 gig. The actual disk size is 1073741824 or roughly 1 gig.
Why the difference?



4. If I put a certain filter/condition on the query it tells me that it's doing
a sequential scan, and yet it takes less time than a full sequential
scan:

explain analyze select count(*) from items where channel < 5000;
                                                        QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=249141.54..249141.54 rows=1 width=0) (actual time=26224.603..26224.608 rows=1 loops=1)
   ->  Seq Scan on items  (cost=0.00..245377.52 rows=1505605 width=0) (actual time=7.599..17686.869 rows=1632057 loops=1)
         Filter: (channel < 5000)
 Total runtime: 26224.703 ms


How can it do a sequential scan and apply a filter to it in less time
than the full sequential scan? Is it actually using an index without
really telling me? 


Here's the structure of the items table

    Column     |           Type           | Modifiers
---------------+--------------------------+-----------
 articlenumber | integer                  | not null
 channel       | integer                  | not null
 title         | character varying        |
 link          | character varying        |
 description   | character varying        |
 comments      | character varying(500)   |
 dtstamp       | timestamp with time zone |
 signature     | character varying(32)    |
 pubdate       | timestamp with time zone |
Indexes:
    "item_channel_link" btree (channel, link)
    "item_created" btree (dtstamp)
    "item_signature" btree (signature)
    "items_channel_article" btree (channel, articlenumber)
    "items_channel_tstamp" btree (channel, dtstamp)


5. Any other comments/suggestions on the above setup.

Thanks,

Dror

-- 
Dror Matalon
Zapatec Inc 
1700 MLK Way
Berkeley, CA 94709
http://www.zapatec.com

Responses

pgsql-performance by date

Next:From: Greg StarkDate: 2003-10-27 03:49:29
Subject: Re: Various performance questions
Previous:From: Hannu KrosingDate: 2003-10-26 14:38:16
Subject: Re: [PERFORM] Slow performance with no apparent reason

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group