Re: PostgreSQL caching

From: Vitaly Belman <vitalib(at)012(dot)net(dot)il>
To: Jochem van Dieten <jochemd(at)oli(dot)tudelft(dot)nl>, Marty Scholes <marty(at)outputservices(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: PostgreSQL caching
Date: 2004-05-25 19:53:05
Message-ID: 1371146711781.20040525225305@012.net.il
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hello Jochem and Marty,

I guess I should have posted the table structure before =(:

Table structure + Indexes
-------------------------

CREATE TABLE public.bv_books
(
book_id serial NOT NULL,
book_title varchar(255) NOT NULL,
series_id int4,
series_index int2,
annotation_desc_id int4,
description_desc_id int4,
book_picture varchar(255) NOT NULL,
vote_avg float4 NOT NULL,
vote_count int4 NOT NULL,
CONSTRAINT bv_books_pkey PRIMARY KEY (book_id)
) WITH OIDS;

CREATE INDEX i_books_vote_avg
ON public.bv_books
USING btree
(vote_avg);

CREATE INDEX i_books_vote_count
ON public.bv_books
USING btree
(vote_count);

-------------------------

CREATE TABLE public.bv_bookgenres
(
book_id int4 NOT NULL,
genre_id int4 NOT NULL,
CONSTRAINT bv_bookgenres_pkey PRIMARY KEY (book_id, genre_id),
CONSTRAINT fk_bookgenres_book_id FOREIGN KEY (book_id) REFERENCES public.bv_books (book_id) ON UPDATE RESTRICT ON DELETE RESTRICT
) WITH OIDS;

CREATE INDEX i_bookgenres_book_id
ON public.bv_bookgenres
USING btree
(book_id);

CREATE INDEX i_bookgenres_genre_id
ON public.bv_bookgenres
USING btree
(genre_id);
-------------------------

MS> I didn't see the table structure, but I assume that the vote_avg and
MS> vote_count fields are in bv_bookgenres. If no fields are actually
MS> needed from bv_bookgenres, then the query might be constructed in a way
MS> that only the index would be read, without loading any row data.

I didn't understand you. vote_avg is stored in bv_books.. So yes, the
only thing I need from bv_bookgenres is the id of the book, but I can't
store this info in bv_books because there is N to N relationship
between them - every book can belong to a number of genres... If
that's what you meant.

MS> I think that you mentioned this was for a web app. Do you actually have
MS> a web page that displays 2000 rows of data?

Well.. It is all "paginated", you can access 2000 items of the data
(as there are actually 2000 books in the genre) but you only see 10
items at a time.. I mean, probably no one would go over the 2000
books, but I can't just hide them =\.

JvD> Presuming that vote_avg is a field in the table bv_bookgenres,
JvD> try a composite index on genre_id and vote_avg and then see if
JvD> you can use the limit clause to reduce the number of loop
JvD> iterations from 1993 to 10.

I'm afraid your idea is invalid in my case =\... Naturally I could
eventually do data coupling to gain perforemnce boost if this issue
will not be solved in other ways. I'll keep your idea in mind anyway,
thanks.

Once again thanks for you feedback.

Regards,
Vitaly Belman

ICQ: 1912453
AIM: VitalyB1984
MSN: tmdagent(at)hotmail(dot)com
Yahoo!: VitalyBe

Tuesday, May 25, 2004, 6:37:44 PM, you wrote:

JvD> Vitaly Belman wrote:
>>
>> If you'll be so kind though, I'd be glad if you could spot anything to
>> speed up in this query. Here's the query and its plan that happens
>> without any caching:
>>
>> -------------------------------------------------------------------------------------------------------------
>> QUERY
>> -----
>> SELECT bv_books. * ,
>> vote_avg,
>> vote_count
>> FROM bv_bookgenres,
>> bv_books
>> WHERE bv_books.book_id = bv_bookgenres.book_id AND
>> bv_bookgenres.genre_id = 5830
>> ORDER BY vote_avg DESC LIMIT 10 OFFSET 0;
>>
>> QUERY PLAN
>> ----------
>> Limit (cost=2337.41..2337.43 rows=10 width=76) (actual
>> time=7875.000..7875.000 rows=10 loops=1)
>> -> Sort (cost=2337.41..2337.94 rows=214 width=76) (actual
>> time=7875.000..7875.000 rows=10 loops=1)
>> Sort Key: bv_books.vote_avg
>> -> Nested Loop (cost=0.00..2329.13 rows=214 width=76)
>> (actual time=16.000..7844.000 rows=1993 loops=1)
>> -> Index Scan using i_bookgenres_genre_id on
>> bv_bookgenres (cost=0.00..1681.54 rows=214 width=4) (actual
>> time=16.000..3585.000 rows=1993 loops=1)
>> Index Cond: (genre_id = 5830)
>> -> Index Scan using bv_books_pkey on bv_books
>> (cost=0.00..3.01 rows=1 width=76) (actual time=2.137..2.137 rows=1
>> loops=1993)
>> Index Cond: (bv_books.book_id = &quot;outer&quot;.book_id)
>> Total runtime: 7875.000 ms

JvD> Presuming that vote_avg is a field in the table bv_bookgenres,
JvD> try a composite index on genre_id and vote_avg and then see if
JvD> you can use the limit clause to reduce the number of loop
JvD> iterations from 1993 to 10.

JvD> CREATE INDEX test_idx ON bv_bookgenres (genre_id, vote_avg);

JvD> The following query tries to force that execution lan and,
JvD> presuming there is a foreign key relation between
JvD> bv_books.book_id AND bv_bookgenres.book_id, I expect it will give
JvD> the same results, but be carefull with NULL's:

JvD> SELECT bv_books. * ,
JvD> vote_avg,
JvD> vote_count
JvD> FROM (
JvD> SELECT bg.*
JvD> FROM bv_bookgenres bg
JvD> WHERE bg.genre_id = 5830
JvD> ORDER BY
JvD> bg.vote_avg DESC
JvD> LIMIT 10
JvD> ) bv_bookgenres,
JvD> bv_books
JvD> WHERE bv_books.book_id = bv_bookgenres.book_id
JvD> ORDER BY
JvD> vote_avg DESC
JvD> LIMIT 10;

JvD> Jochem

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Robert Treat 2004-05-25 21:07:51 Re: Interpreting vmstat
Previous Message Josh Sacks 2004-05-25 18:37:55 Not using Primary Key in query