Re: Hardware upgrade for a high-traffic database

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Jason Coene" <jcoene(at)gotfrag(dot)com>
Cc: "'Merlin Moncure'" <merlin(dot)moncure(at)rcsonline(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Hardware upgrade for a high-traffic database
Date: 2004-08-12 02:09:14
Message-ID: 23019.1092276554@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

"Jason Coene" <jcoene(at)gotfrag(dot)com> writes:
> We have a lot of:

> SELECT whatever
> FROM ourtable
> WHERE field1 = X
> AND field2 = Y
> AND field3 = Z
> ORDER BY id DESC
> LIMIT 5

> With indexes:

> ourtable(id)
> ourtable(field1, field2, field3)

> Is it standard procedure with postgres to include any fields listed in WHERE
> in the ORDER BY, and create a single index for only the ORDER BY fields (in
> order of appearance, of course)?

It depends. If the X/Y/Z constraint is already pretty selective, then
it seems sufficient to me to pick up the matching rows (using the
3-field index), sort them by id, and take the first 5. The case where
the extra-index-column trick is useful is where the WHERE clause *isn't*
real selective and so a lot of rows would have to be sorted. In your
previous example, I imagine you have a lot of prolific posters and so
"all posts by userid 42" can be a nontrivial set. The double-column
index lets you skip the sort and just pull out the required rows by
scanning from the end of the range of userid = 42 entries.

> gf=# EXPLAIN ANALYZE SELECT id FROM comments WHERE userid = 51 ORDER BY
> timestamp DESC LIMIT 5;
> QUERY PLAN
> ----------------------------------------------------------------------------
> -------------------------------------------------------------------
> Limit (cost=1608.43..1608.45 rows=5 width=8) (actual time=0.292..0.317
> rows=5 loops=1)
> -> Sort (cost=1608.43..1609.45 rows=407 width=8) (actual
> time=0.287..0.295 rows=5 loops=1)
> Sort Key: "timestamp"
> -> Index Scan using comments_ix_userid on comments
> (cost=0.00..1590.79 rows=407 width=8) (actual time=0.031..0.190 rows=35
> loops=1)
> Index Cond: (userid = 51)
> Total runtime: 0.375 ms
> (6 rows)

This example looks fine, but since userid 51 evidently only has 35
posts, there's not much time needed to read 'em all and sort 'em. The
place where the double-column index will win big is on userids with
hundreds of posts.

You have to keep in mind that each index costs time to maintain during
inserts/updates. So adding an index just because it makes a few queries
a little faster probably isn't a win. You need to make tradeoffs.

regards, tom lane

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Pierre-Frédéric Caillaud 2004-08-12 06:49:37 Re: Hardware upgrade for a high-traffic database
Previous Message Jason Coene 2004-08-12 00:29:04 Re: Hardware upgrade for a high-traffic database