Skip site navigation (1) Skip section navigation (2)

Re: caching table/query

From: Michael Fuhr <mike(at)fuhr(dot)org>
To: Tsirkin Evgeny <tsurkin(at)mail(dot)jct(dot)ac(dot)il>
Cc: Chris Travers <chris(at)verkiel(dot)metatrontech(dot)com>,Tsirkin Evgeny <tsurkin(at)jct(dot)ac(dot)il>, pgsql-admin(at)postgresql(dot)org
Subject: Re: caching table/query
Date: 2005-07-20 09:54:07
Message-ID: 20050720095407.GA38597@winnie.fuhr.org (view raw or flat)
Thread:
Lists: pgsql-admin
On Wed, Jul 20, 2005 at 10:25:59AM +0300, Tsirkin Evgeny wrote:
> trying to reduce query time and since that is a web application i now 
> using caching with perl MLDBM::Sync::SDBM_File on application level.
> (Basically that involves quering all the table and putting it into
> MLDBM::Sync::SDBM_File and then quering it back). What is interesting
> is that while  time quering the tables from postgres is taken 0.3 sec.
> using sdbm it takes 0.1 sec.

PostgreSQL isn't likely to win a speed contest against a trivial
storage/retrieval library when the test involves simple queries and
little or no concurrency.  PostgreSQL is a full-featured database
with goals beyond just "make SELECT statements as fast as possible."

> Actually i am not anymore sure that postgres does NOT cache table in memory,
> maybe it does and i don't know about it?

As I mentioned in an earlier message, you could enable statistics
gathering and use the statistics views to see whether your queries
are being satistified from the buffer cache or if they require a
call to the operating system:

http://www.postgresql.org/docs/8.0/static/monitoring-stats.html

The first time you query a table it will probably need to be fetched
from disk (or from the operating system's cache); subsequent queries
will likely be much faster because PostgreSQL will have stored the
fetched pages in its buffer cache.  Example:

SELECT heap_blks_read, heap_blks_hit, idx_blks_read, idx_blks_hit
FROM pg_statio_user_tables
WHERE relname = 'foo';
 heap_blks_read | heap_blks_hit | idx_blks_read | idx_blks_hit 
----------------+---------------+---------------+--------------
              0 |             0 |             0 |            0
(1 row)

EXPLAIN ANALYZE SELECT * FROM foo WHERE id = 12345;
                                                   QUERY PLAN                                                    
-----------------------------------------------------------------------------------------------------------------
 Index Scan using foo_pkey on foo  (cost=0.00..3.01 rows=1 width=18) (actual time=73.686..73.698 rows=1 loops=1)
   Index Cond: (id = 12345)
 Total runtime: 73.934 ms
(3 rows)

SELECT heap_blks_read, heap_blks_hit, idx_blks_read, idx_blks_hit 
FROM pg_statio_user_tables 
WHERE relname = 'foo';
 heap_blks_read | heap_blks_hit | idx_blks_read | idx_blks_hit 
----------------+---------------+---------------+--------------
              1 |             0 |             3 |            0
(1 row)

EXPLAIN ANALYZE SELECT * FROM foo WHERE id = 12345;
                                                  QUERY PLAN                                                   
---------------------------------------------------------------------------------------------------------------
 Index Scan using foo_pkey on foo  (cost=0.00..3.01 rows=1 width=18) (actual time=0.072..0.083 rows=1 loops=1)
   Index Cond: (id = 12345)
 Total runtime: 0.237 ms
(3 rows)

SELECT heap_blks_read, heap_blks_hit, idx_blks_read, idx_blks_hit
FROM pg_statio_user_tables
WHERE relname = 'foo';
 heap_blks_read | heap_blks_hit | idx_blks_read | idx_blks_hit 
----------------+---------------+---------------+--------------
              1 |             1 |             3 |            3
(1 row)

Notice how much faster the second query was, even though it was
identical to the first.  As you can see from the statistics view,
the second query's page fetches were all "hits," meaning they were
retrieved from the buffer cache.

If you want to analyze performance, then make sure your tests mimic
the activity you expect to see on the production system, such as
number of concurrent connections; amount of select, insert, update,
and delete activity; similar data sets in terms of size and
distribution; representative queries; etc.  Use PostgreSQL's and
the operating system's instrumentation to identify performance
bottlenecks, and then look for remedies.

BTW, pgsql-performance might be a more appropriate list for this
thread.

-- 
Michael Fuhr
http://www.fuhr.org/~mfuhr/

In response to

pgsql-admin by date

Next:From: 梁炳場Date: 2005-07-20 11:25:04
Subject: su permission denied
Previous:From: Dawid KuroczkoDate: 2005-07-20 09:24:07
Subject: Re: backing up without lock ?

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group