Performance hit on loading from HD

From: Vitaly Belman <vitalyb(at)gmail(dot)com>
To: Postgresql Performance <pgsql-performance(at)postgresql(dot)org>
Subject: Performance hit on loading from HD
Date: 2004-08-28 17:41:51
Message-ID: fa96e3c604082810411b238e8c@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I have a problem with certain queries performance. Trouble is that
while their execution plan is pretty good and mostly their execution
is great as well, their FIRST execution time (that is after you mount
the database) is abysmal.

I realize that it happens due to the loading of data from the HD to
the memory/swap and it wouldn't be too bad if I just could make the
data stay in the memory, sadly, after a few minutes the data is back
on the HD and running the query again results the same bad
performance.

Here's a query for example, though as I said, this problem occurs in
different queries.

---------------------------------------------------------------------------------------

SELECT *
FROM bv_bookgenres, bv_books
WHERE bv_books.book_id = bv_bookgenres.book_id and genre_id = 987
ORDER BY vote_avg limit 10

---------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------

QUERY PLAN
Limit (cost=2601.16..2601.18 rows=10 width=193) (actual
time=4735.097..4735.107 rows=10 loops=1)
-> Sort (cost=2601.16..2601.70 rows=219 width=193) (actual
time=4735.092..4735.095 rows=10 loops=1)
Sort Key: bv_books.vote_avg
-> Nested Loop (cost=0.00..2592.64 rows=219 width=193)
(actual time=74.615..4719.147 rows=1877 loops=1)
-> Index Scan using i_bookgenres_genre_id on
bv_bookgenres (cost=0.00..1707.03 rows=218 width=4) (actual
time=74.540..2865.366 rows=1877 loops=1)
Index Cond: (genre_id = 987)
-> Index Scan using bv_books_pkey on bv_books
(cost=0.00..4.05 rows=1 width=193) (actual time=0.968..0.971 rows=1
loops=1877)
Index Cond: (bv_books.book_id = "outer".book_id)
Total runtime: 4735.726 ms

---------------------------------------------------------------------------------------

If I run the query again after it just finished running I would get
the following timing:

---------------------------------------------------------------------------------------

Limit (cost=3937.82..3937.84 rows=10 width=204)
-> Sort (cost=3937.82..3938.38 rows=223 width=204)
Sort Key: bv_books.vote_avg
-> Nested Loop (cost=0.00..3929.12 rows=223 width=204)
-> Index Scan using i_bookgenres_genre_id on
bv_bookgenres (cost=0.00..1731.94 rows=222 width=8)
Index Cond: (genre_id = 987)
-> Index Scan using bv_books_pkey on bv_books
(cost=0.00..9.88 rows=1 width=196)
Index Cond: (bv_books.book_id = "outer".book_id)

---------------------------------------------------------------------------------------

Before going on, I should say that I am running PostgreSQL on CoLinux
under Windows 2000. From what I read/tested, the CoLinux performance
on CoLinux are matching to the performance of VMWare. Yet, I'm still
wondering if it is a side effect of my development setup or if some of
my settings are indeed wrong.

With that said, here is the information of the tables:

---------------------------------------------------------------------------------------

CREATE TABLE bv_books
(
book_id serial NOT NULL,
book_name varchar(255) NOT NULL,
series_id int4,
series_index int2,
annotation_desc_id int4,
description_desc_id int4,
book_picture varchar(255) NOT NULL,
reviews_error int4 NOT NULL,
vote_avg float4 NOT NULL,
vote_count int4 NOT NULL,
book_genre int4[],
book_name_fulltext tsearch2.tsvector,
book_name_fulltext2 tsearch2.tsvector,
CONSTRAINT bv_books_pkey PRIMARY KEY (book_id),
CONSTRAINT fk_books_annotation_desc_id FOREIGN KEY
(annotation_desc_id) REFERENCES bv_descriptions (description_id) ON
UPDATE RESTRICT ON DELETE SET NULL,
CONSTRAINT fk_books_description_desc_id FOREIGN KEY
(description_desc_id) REFERENCES bv_descriptions (description_id) ON
UPDATE RESTRICT ON DELETE SET NULL,
CONSTRAINT fk_books_series_id FOREIGN KEY (series_id) REFERENCES
bv_series (series_id) ON UPDATE RESTRICT ON DELETE RESTRICT
)
WITH OIDS;

CREATE TABLE bv_bookgenres
(
book_id int4 NOT NULL,
genre_id int4 NOT NULL,
CONSTRAINT bv_bookgenres_pkey PRIMARY KEY (book_id, genre_id),
CONSTRAINT fk_bookgenres_book_id FOREIGN KEY (book_id) REFERENCES
bv_books (book_id) ON UPDATE RESTRICT ON DELETE CASCADE,
CONSTRAINT fk_bookgenres_genre_id FOREIGN KEY (genre_id) REFERENCES
bv_genres (genre_id) ON UPDATE RESTRICT ON DELETE RESTRICT
)
WITH OIDS;

---------------------------------------------------------------------------------------

As far as the data is concerned, there are around 170,000 rows in
bv_books and 940,000 in bv_bookgenres. There are also btree index on
all the relevant (to the query) fields.

I can live up with the fact that the data has to be loaded the first
time it is accessed, but is it possible to make it stick longer in the
memory? Is it the fact that CoLinux gets only 128MB of RAM? Or one of
my settings should be fixed?

Thanks

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Josh Berkus 2004-08-28 21:34:23 Re: Performance hit on loading from HD
Previous Message Gaetano Mendola 2004-08-28 16:35:18 Re: ill-planned queries inside a stored procedure