Skip site navigation (1) Skip section navigation (2)

Re: PostgreSQL caching

From: Vitaly Belman <vitalib(at)012(dot)net(dot)il>
To: Jochem van Dieten <jochemd(at)oli(dot)tudelft(dot)nl>,Marty Scholes <marty(at)outputservices(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: PostgreSQL caching
Date: 2004-05-25 19:53:05
Message-ID: 1371146711781.20040525225305@012.net.il (view raw or flat)
Thread:
Lists: pgsql-performance
Hello Jochem and Marty,

I guess I should have posted the table structure before =(:

Table structure + Indexes
-------------------------

CREATE TABLE public.bv_books
(
  book_id serial NOT NULL,
  book_title varchar(255) NOT NULL,
  series_id int4,
  series_index int2,
  annotation_desc_id int4,
  description_desc_id int4,
  book_picture varchar(255) NOT NULL,
  vote_avg float4 NOT NULL,
  vote_count int4 NOT NULL,
  CONSTRAINT bv_books_pkey PRIMARY KEY (book_id)
) WITH OIDS;

CREATE INDEX i_books_vote_avg
  ON public.bv_books
  USING btree
  (vote_avg);

CREATE INDEX i_books_vote_count
  ON public.bv_books
  USING btree
  (vote_count);

-------------------------

CREATE TABLE public.bv_bookgenres
(
  book_id int4 NOT NULL,
  genre_id int4 NOT NULL,
  CONSTRAINT bv_bookgenres_pkey PRIMARY KEY (book_id, genre_id),
  CONSTRAINT fk_bookgenres_book_id FOREIGN KEY (book_id) REFERENCES public.bv_books (book_id) ON UPDATE RESTRICT ON DELETE RESTRICT
) WITH OIDS;

CREATE INDEX i_bookgenres_book_id
  ON public.bv_bookgenres
  USING btree
  (book_id);

CREATE INDEX i_bookgenres_genre_id
  ON public.bv_bookgenres
  USING btree
  (genre_id);
-------------------------

MS> I didn't see the table structure, but I assume that the vote_avg and
MS> vote_count fields are in bv_bookgenres.  If no fields are actually 
MS> needed from bv_bookgenres, then the query might be constructed in a way 
MS> that only the index would be read, without loading any row data.

I didn't understand you. vote_avg is stored in bv_books.. So yes, the
only thing I need from bv_bookgenres is the id of the book, but I can't
store this info in bv_books because there is N to N relationship
between them - every book can belong to a number of genres... If
that's what you meant.

MS> I think that you mentioned this was for a web app.  Do you actually have
MS> a web page that displays 2000 rows of data?

Well.. It is all "paginated", you can access 2000 items of the data
(as there are actually 2000 books in the genre) but you only see 10
items at a time.. I mean, probably no one would go over the 2000
books, but I can't just hide them =\.

JvD> Presuming that vote_avg is a field in the table bv_bookgenres,
JvD> try a composite index on genre_id and vote_avg and then see if 
JvD> you can use the limit clause to reduce the number of loop 
JvD> iterations from 1993 to 10.

I'm afraid your idea is invalid in my case =\... Naturally I could
eventually do data coupling to gain perforemnce boost if this issue
will not be solved in other ways. I'll keep your idea in mind anyway,
thanks.

Once again thanks for you feedback.

Regards,
 Vitaly Belman
 
 ICQ: 1912453
 AIM: VitalyB1984
 MSN: tmdagent(at)hotmail(dot)com
 Yahoo!: VitalyBe

Tuesday, May 25, 2004, 6:37:44 PM, you wrote:

JvD> Vitaly Belman wrote:
>> 
>> If you'll be so kind though, I'd be glad if you could spot anything to
>> speed up in this query. Here's the query and its plan that happens
>> without any caching:
>> 
>> -------------------------------------------------------------------------------------------------------------
>> QUERY
>> -----
>> SELECT     bv_books. * ,
>>            vote_avg, 
>>            vote_count 
>> FROM       bv_bookgenres, 
>>            bv_books 
>> WHERE      bv_books.book_id = bv_bookgenres.book_id AND 
>>            bv_bookgenres.genre_id = 5830
>> ORDER BY   vote_avg DESC LIMIT 10 OFFSET 0; 
>>  
>> QUERY PLAN
>> ----------
>> Limit  (cost=2337.41..2337.43 rows=10 width=76) (actual
>> time=7875.000..7875.000 rows=10 loops=1)
>>   ->  Sort  (cost=2337.41..2337.94 rows=214 width=76) (actual
>> time=7875.000..7875.000 rows=10 loops=1)
>>         Sort Key: bv_books.vote_avg
>>         ->  Nested Loop  (cost=0.00..2329.13 rows=214 width=76)
>> (actual time=16.000..7844.000 rows=1993 loops=1)
>>               ->  Index Scan using i_bookgenres_genre_id on
>> bv_bookgenres  (cost=0.00..1681.54 rows=214 width=4) (actual
>> time=16.000..3585.000 rows=1993 loops=1)
>>                     Index Cond: (genre_id = 5830)
>>               ->  Index Scan using bv_books_pkey on bv_books 
>> (cost=0.00..3.01 rows=1 width=76) (actual time=2.137..2.137 rows=1
>> loops=1993)
>>                     Index Cond: (bv_books.book_id = &quot;outer&quot;.book_id)
>> Total runtime: 7875.000 ms

JvD> Presuming that vote_avg is a field in the table bv_bookgenres, 
JvD> try a composite index on genre_id and vote_avg and then see if 
JvD> you can use the limit clause to reduce the number of loop 
JvD> iterations from 1993 to 10.

JvD> CREATE INDEX test_idx ON bv_bookgenres (genre_id, vote_avg);


JvD> The following query tries to force that execution lan and, 
JvD> presuming there is a foreign key relation between 
JvD> bv_books.book_id AND bv_bookgenres.book_id, I expect it will give
JvD> the same results, but be carefull with NULL's:

JvD> SELECT	bv_books. * ,
JvD> 	vote_avg,
JvD> 	vote_count
JvD> FROM 	(
JvD> 		SELECT	bg.*
JvD> 		FROM 	bv_bookgenres bg
JvD> 		WHERE	bg.genre_id = 5830
JvD> 		ORDER BY
JvD> 			bg.vote_avg DESC
JvD> 		LIMIT	10
JvD> 	) bv_bookgenres,
JvD> 	bv_books
JvD> WHERE	bv_books.book_id = bv_bookgenres.book_id
JvD> ORDER BY
JvD> 	vote_avg DESC
JvD> LIMIT	10;

JvD> Jochem




In response to

Responses

pgsql-performance by date

Next:From: Robert TreatDate: 2004-05-25 21:07:51
Subject: Re: Interpreting vmstat
Previous:From: Josh SacksDate: 2004-05-25 18:37:55
Subject: Not using Primary Key in query

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group