Re: PostgreSQL caching

From: Vitaly Belman <vitalib(at)012(dot)net(dot)il>
To: Marty Scholes <marty(at)outputservices(dot)com>, Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>, Nick Barr <nicky(at)chuckie(dot)co(dot)uk>
Cc: Jochem van Dieten <jochemd(at)oli(dot)tudelft(dot)nl>, pgsql-performance(at)postgresql(dot)org
Subject: Re: PostgreSQL caching
Date: 2004-05-26 14:33:56
Message-ID: 1673440625.20040526173356@012.net.il
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hello Marty, Nick and Robert,

NB> Depending on what version of PG you are running, IN might take a while
NB> to complete. If so try an EXISTS instead

RT> A question and two experiments... what version of postgresql is this?

I am using the newer 7.5dev native Windows port. For this reason I
don't think that IN will cause any trouble (I read that this issue was
resolved in 7.4).

MS> At any rate, a query with an IN clause should help quite a bit

MS> SELECT bv_books. *
MS> FROM bv_books
MS> WHERE bv_books.book_id IN (
MS> SELECT book_id
MS> FROM bv_genres
MS> WHERE bv_bookgenres.genre_id = 5830
MS> )
MS> ORDER BY vote_avg DESC LIMIT 10 OFFSET 0;

It looks like it helps a bit (though you meant "FROM bv_bookgenres",
right?). I can't tell you how MUCH it helped though, because of two
reasons:

1) As soon as I run a query, it is cached in the memory and I can't
really find a good way to flush it out of there to test again except a
full computer reset (shutting postmaster down doesn't help). If you
have a better idea on this, do tell me =\ (Reminding again, I am on
Windows).

2) I *think* I resolved this issue, at least for most of the genre_ids
(didn't go through them all, but tried a few with different book count
and the results looked quite good). The fault was partly mine, a few
weeks ago I increase the statistics for the genre_id column a bit too
much (from 10 to 70), I was unsure how exactly it works (and still am)
but it helped for a few genre_ids that had a high book count, yet it
also hurt the performence for the genres without as much ids. I now
halved the stastics (to 58) and almost everything looks good now.

Because of that I'll stop working on that query for a while (unless
you have some more performance tips on the subject). Big thanks to
everyone who helped.. And I might bring this issue later again, it it
still will cause too much troubles.

RT> Try reindexing i_bookgenres_genre_id and capture the explain
RT> analyze for that.

Is that's what you meant "REINDEX INDEX i_bookgenres_genre_id"? But it
returns no messages what-so-ever =\. I can EXPLAIN it either.

RT> If it doesn't help try doing set enable_indexscan = false and
RT> capture the explain analyze for that.

Here it is:

------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN
Limit (cost=41099.93..41099.96 rows=10 width=76) (actual time=6734.000..6734.000 rows=10 loops=1)
-> Sort (cost=41099.93..41100.45 rows=208 width=76) (actual time=6734.000..6734.000 rows=10 loops=1)
Sort Key: bv_books.vote_count
-> Merge Join (cost=40229.21..41091.92 rows=208 width=76) (actual time=6078.000..6593.000 rows=1993 loops=1)
Merge Cond: ("outer".book_id = "inner".book_id)
-> Sort (cost=16817.97..16818.49 rows=208 width=4) (actual time=1062.000..1062.000 rows=1993 loops=1)
Sort Key: bv_bookgenres.book_id
-> Seq Scan on bv_bookgenres (cost=0.00..16809.96 rows=208 width=4) (actual time=0.000..1047.000 rows=1993 loops=1)
Filter: (genre_id = 5830)
-> Sort (cost=23411.24..23841.04 rows=171918 width=76) (actual time=5016.000..5189.000 rows=171801 loops=1)
Sort Key: bv_books.book_id
-> Seq Scan on bv_books (cost=0.00..4048.18 rows=171918 width=76) (actual time=0.000..359.000 rows=171918 loops=1)
Total runtime: 6734.000 ms
------------------------------------------------------------------------------------------------------------------------------------------

Regards,
Vitaly Belman

ICQ: 1912453
AIM: VitalyB1984
MSN: tmdagent(at)hotmail(dot)com
Yahoo!: VitalyBe

Wednesday, May 26, 2004, 1:24:18 AM, you wrote:

MS> Vitaly,

MS> This looks like there might be some room for performance improvement...

>> MS> I didn't see the table structure, but I assume
>> MS> that the vote_avg and
>> MS> vote_count fields are in bv_bookgenres.
>>
>> I didn't understand you. vote_avg is stored in bv_books.

MS> Ok. That helps. The confusion (on my end) came from the SELECT clause
MS> of the query you provided:

>> SELECT bv_books. * ,
>> vote_avg,
>> vote_count

MS> All fields from bv_books were selected (bv_books.*) along with vote_agv
MS> and vote_count. My assumption was that vote_avg and vote_count were
MS> therefore not in bv_books.

MS> At any rate, a query with an IN clause should help quite a bit:

MS> SELECT bv_books. *
MS> FROM bv_books
MS> WHERE bv_books.book_id IN (
MS> SELECT book_id
MS> FROM bv_genres
MS> WHERE bv_bookgenres.genre_id = 5830
MS> )
MS> ORDER BY vote_avg DESC LIMIT 10 OFFSET 0;

MS> Give it a whirl.

MS> Marty

MS> ---------------------------(end of
MS> broadcast)---------------------------
MS> TIP 6: Have you searched our list archives?

MS> http://archives.postgresql.org

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Mario Soto 2004-05-26 15:26:30 performance very slow
Previous Message Robert Treat 2004-05-26 13:29:06 Re: where to find out when a table was last analyzed?