Skip site navigation (1) Skip section navigation (2)

Re: Hardware upgrade for a high-traffic database

From: "Jason Coene" <jcoene(at)gotfrag(dot)com>
To: "'Tom Lane'" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "'Merlin Moncure'" <merlin(dot)moncure(at)rcsonline(dot)com>,<pgsql-performance(at)postgresql(dot)org>
Subject: Re: Hardware upgrade for a high-traffic database
Date: 2004-08-12 00:29:04
Message-ID: 200408120029.i7C0T30E002389@mail.gotfrag.com (view raw or flat)
Thread:
Lists: pgsql-performance
> You mean you are doing
>     SELECT ... WHERE userid = 42 ORDER BY timestamp DESC LIMIT 5;
> and hoping that separate indexes on userid and timestamp will get the
> job done?  They won't.  There are only two possible plans for this,
> neither very good: select all of user 42's posts and sort them, or
> scan timewise backwards through *all* posts looking for the last 5 from
> user 42.

Wow!  I did try the method you state below (including the WHERE restricted
column in the sort by, and creating a two-column index), and it did execute
much faster (even on odd userid's to avoid cached results as much as
possible).

We have a lot of:

SELECT whatever
	FROM ourtable
	WHERE field1 = X
	AND field2 = Y
	AND field3 = Z
	ORDER BY id DESC
	LIMIT 5

With indexes:

ourtable(id)
ourtable(field1, field2, field3)

Is it standard procedure with postgres to include any fields listed in WHERE
in the ORDER BY, and create a single index for only the ORDER BY fields (in
order of appearance, of course)?

> 
> If you do this enough to justify a specialized index, I would suggest a
> two-column index on (userid, timestamp).  You will also need to tweak
> the query, because the planner is not quite smart enough to deduce that
> such an index is applicable to the given sort order:
>     SELECT ... WHERE userid = 42 ORDER BY userid DESC, timestamp DESC
> LIMIT 5;
> This should generate an index-scan-backwards plan that will execute nigh
> instantaneously, because it will only fetch the rows you really want.
> 
> You might or might not be able to drop the separate indexes on userid
> and timestamp, depending on what other queries you might have that need
> them.  But you should be paying attention to what plans you are really
> getting (see EXPLAIN) rather than just assuming that some indexes chosen
> at random will do what you need.
> 
> 			regards, tom lane
> 

We do many varied queries on nearly every table - our data is highly
relational, and we have a lot of indexes.  I thought the planner would pick
up the right index via constraints and not require them in ORDER BY...
EXPLAIN ANALYZE says that the indexes are being used, ala:

gf=# EXPLAIN ANALYZE SELECT id FROM comments WHERE userid = 51 ORDER BY
timestamp DESC LIMIT 5;
                                                                  QUERY PLAN
----------------------------------------------------------------------------
-------------------------------------------------------------------
 Limit  (cost=1608.43..1608.45 rows=5 width=8) (actual time=0.292..0.317
rows=5 loops=1)
   ->  Sort  (cost=1608.43..1609.45 rows=407 width=8) (actual
time=0.287..0.295 rows=5 loops=1)
         Sort Key: "timestamp"
         ->  Index Scan using comments_ix_userid on comments
(cost=0.00..1590.79 rows=407 width=8) (actual time=0.031..0.190 rows=35
loops=1)
               Index Cond: (userid = 51)
 Total runtime: 0.375 ms
(6 rows)

Is this the wrong procedure?  Your suggested syntax seems much more
efficient, but I don't quite understand exactly why, as PG is using our
existing indexes...

gf=# EXPLAIN ANALYZE SELECT id FROM comments WHERE userid = 51 ORDER BY
userid DESC, timestamp DESC LIMIT 5;
 
QUERY PLAN
----------------------------------------------------------------------------
----------------------------------------------------------------------------
----
 Limit  (cost=0.00..19.90 rows=5 width=12) (actual time=0.040..0.076 rows=5
loops=1)
   ->  Index Scan Backward using comments_ix_userid_timestamp on comments
(cost=0.00..1620.25 rows=407 width=12) (actual time=0.035..0.054 rows=5
loops=1)
         Index Cond: (userid = 51)
 Total runtime: 0.134 ms
(4 rows)

Note: This was done after adding an index on comments (userid, timestamp)

Regards,

Jason


In response to

Responses

pgsql-performance by date

Next:From: Tom LaneDate: 2004-08-12 02:09:14
Subject: Re: Hardware upgrade for a high-traffic database
Previous:From: Josh BerkusDate: 2004-08-12 00:25:55
Subject: Re: Buld Insert and Index use.

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group