Re: PostgreSQL caching

From: Vitaly Belman <vitalib(at)012(dot)net(dot)il>
To: Richard Huxton <dev(at)archonet(dot)com>, Rosser Schwarz <rschwarz(at)totalcardinc(dot)com>
Cc: Postgresql Performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: PostgreSQL caching
Date: 2004-05-21 17:33:37
Message-ID: 20792805375.20040521203337@012.net.il
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hello Richard and Rosser,

Thank you both for the answers.

I tried creating a semi cache by running all the queries and indeed it
worked and I might use such way in the future if needed, yet though, I
can't help but to feel it isn't exactly the right way to work around
this problem. If I do, I might as well increase the effective_cache
value as pointed by the config docs.

Also on this subject, previously I was only fighting with queries that
run poorly even if you run them 10 days in the row.. They don't seem
to be cached at all. Does it cahce the query result? If so, it should
make any query run almost immediately the second time. If it doesn't
cache the actual result, what does it cache?

If you'll be so kind though, I'd be glad if you could spot anything to
speed up in this query. Here's the query and its plan that happens
without any caching:

-------------------------------------------------------------------------------------------------------------
QUERY
-----
SELECT bv_books. * ,
vote_avg,
vote_count
FROM bv_bookgenres,
bv_books
WHERE bv_books.book_id = bv_bookgenres.book_id AND
bv_bookgenres.genre_id = 5830
ORDER BY vote_avg DESC LIMIT 10 OFFSET 0;

QUERY PLAN
----------
Limit (cost=2337.41..2337.43 rows=10 width=76) (actual time=7875.000..7875.000 rows=10 loops=1)
-> Sort (cost=2337.41..2337.94 rows=214 width=76) (actual time=7875.000..7875.000 rows=10 loops=1)
Sort Key: bv_books.vote_avg
-> Nested Loop (cost=0.00..2329.13 rows=214 width=76) (actual time=16.000..7844.000 rows=1993 loops=1)
-> Index Scan using i_bookgenres_genre_id on bv_bookgenres (cost=0.00..1681.54 rows=214 width=4) (actual time=16.000..3585.000 rows=1993 loops=1)
Index Cond: (genre_id = 5830)
-> Index Scan using bv_books_pkey on bv_books (cost=0.00..3.01 rows=1 width=76) (actual time=2.137..2.137 rows=1 loops=1993)
Index Cond: (bv_books.book_id = "outer".book_id)
Total runtime: 7875.000 ms
-------------------------------------------------------------------------------------------------------------

Some general information:

bv_books holds 17000 rows.
bv_bookgenres holds 938797 rows.

Using the WHERE (genre_id == 5838) it cuts the number of book_ids to
around 2000.

As far as indexes are concerned, there's an index on all the rows
mentioned in the query (as can be seen from the explain), including
the vote_avg row.

Thanks and regards,
Vitaly Belman

ICQ: 1912453
AIM: VitalyB1984
MSN: tmdagent(at)hotmail(dot)com
Yahoo!: VitalyBe

Friday, May 21, 2004, 6:34:12 PM, you wrote:

RH> Vitaly Belman wrote:
>> Hello,
>>
>> I have the following problem:
>>
>> When I run some query after I just run the Postmaster, it takse
>> several seconds to execute (sometimes more than 10), if I rerun it
>> again afterwards, it takes mere milliseconds.
>>
>> So, I guess it has to do with PostgreSQL caching.. But how exactly
>> does it work? What does it cache? And how can I control it?

RH> There are two areas of cache - PostgreSQL's shared buffers and the
RH> operating system's disk-cache. You can't directly control what data is
RH> cached, it just keeps track of recently used data. It sounds like PG
RH> isn't being used for a while so your OS decides to use its cache for
RH> webserver files.

>> I would like to load selected information in the memory before a user
>> runs the query. Can I do it somehow? As PostgreSQL is used in my case
>> as webserver, it isn't really helping if the user has to wait 10
>> seconds every time he goes to a new page (even if refreshing the page
>> would be really quick, sine Postgre already loaded the data to
>> memory).

RH> If you could "pin" data in the cache it would run quicker, but at the
RH> cost of everything else running slower.

RH> Suggested steps:
RH> 1. Read the configuration/tuning guide at:
RH> http://www.varlena.com/varlena/GeneralBits/Tidbits/index.php
RH> 2. Post a sample query/explain analyse that runs very slowly when not
RH> cached.
RH> 3. If needs be, you can write a simple timed script that performs a
RH> query. Or, the autovacuum daemon might be what you want.

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Rod Taylor 2004-05-21 18:33:54 Re: PostgreSQL caching
Previous Message Chris Browne 2004-05-21 17:22:50 Re: PostgreSQL caching