Re: Hardware upgrade for a high-traffic database

From: Pierre-Frédéric Caillaud <lists(at)boutiquenumerique(dot)com>
To: "Jason Coene" <jcoene(at)gotfrag(dot)com>, "'Tom Lane'" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "'Merlin Moncure'" <merlin(dot)moncure(at)rcsonline(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Hardware upgrade for a high-traffic database
Date: 2004-08-12 06:49:37
Message-ID: opsclw8zdccq72hf@musicbox
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Wed, 11 Aug 2004 20:29:04 -0400, Jason Coene <jcoene(at)gotfrag(dot)com> wrote:

> gf=# EXPLAIN ANALYZE SELECT id FROM comments WHERE userid = 51 ORDER BY
> timestamp DESC LIMIT 5;
> QUERY
> PLAN
> ----------------------------------------------------------------------------
> -------------------------------------------------------------------
> Limit (cost=1608.43..1608.45 rows=5 width=8) (actual time=0.292..0.317
> rows=5 loops=1)
> -> Sort (cost=1608.43..1609.45 rows=407 width=8) (actual
> time=0.287..0.295 rows=5 loops=1)
> Sort Key: "timestamp"
> -> Index Scan using comments_ix_userid on comments
> (cost=0.00..1590.79 rows=407 width=8) (actual time=0.031..0.190 rows=35
> loops=1)
> Index Cond: (userid = 51)
> Total runtime: 0.375 ms
> (6 rows)

Well, you have to read it from the bottom.
- Index Scan using comments_ix_userid :
It selects all records for your user.
rows=407 : there are 407 rows.

-> Sort (cost=1608.43..1609.45 rows=407 width=8)
It sorts them to find the 5 more recent.

So basically you grab 407 rows to return only 5, so you do 80x more disk
I/O than necessary. It is likely that posts from all users are interleaved
in the table, so this probably translates directly into 407 page fetches.

Note : EXPLAIN ANALYZE will only give good results the first time you run
it. The second time, all data is in the cache, so it looks really faster
than it is.

> gf=# EXPLAIN ANALYZE SELECT id FROM comments WHERE userid = 51 ORDER BY
> userid DESC, timestamp DESC LIMIT 5;
> QUERY PLAN
> ----
> Limit (cost=0.00..19.90 rows=5 width=12) (actual time=0.040..0.076
> rows=5
> loops=1)
> -> Index Scan Backward using comments_ix_userid_timestamp on comments
> (cost=0.00..1620.25 rows=407 width=12) (actual time=0.035..0.054 rows=5
> loops=1)
> Index Cond: (userid = 51)
> Total runtime: 0.134 ms
> (4 rows)
>
> Note: This was done after adding an index on comments (userid, timestamp)

Well, this one correctly uses the index, fetches 5 rows, and returns them.

So, excluding index page hits, your unoptimized query has >400 page
fetches, and your optimized one has 5 page fetches. Still wonder why it's
faster ?

Seq scan is fast when locality of reference is good. In your case, it's
very bad.

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Ulrich Wisser 2004-08-12 10:37:44 Re: [GENERAL] How to know which queries are to be optimised?
Previous Message Tom Lane 2004-08-12 02:09:14 Re: Hardware upgrade for a high-traffic database