Re: Hardware upgrade for a high-traffic database

From: "Jason Coene" <jcoene(at)gotfrag(dot)com>
To: "'Tom Lane'" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "'Merlin Moncure'" <merlin(dot)moncure(at)rcsonline(dot)com>, <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Hardware upgrade for a high-traffic database
Date: 2004-08-12 00:29:04
Message-ID: 200408120029.i7C0T30E002389@mail.gotfrag.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

> You mean you are doing
> SELECT ... WHERE userid = 42 ORDER BY timestamp DESC LIMIT 5;
> and hoping that separate indexes on userid and timestamp will get the
> job done? They won't. There are only two possible plans for this,
> neither very good: select all of user 42's posts and sort them, or
> scan timewise backwards through *all* posts looking for the last 5 from
> user 42.

Wow! I did try the method you state below (including the WHERE restricted
column in the sort by, and creating a two-column index), and it did execute
much faster (even on odd userid's to avoid cached results as much as
possible).

We have a lot of:

SELECT whatever
FROM ourtable
WHERE field1 = X
AND field2 = Y
AND field3 = Z
ORDER BY id DESC
LIMIT 5

With indexes:

ourtable(id)
ourtable(field1, field2, field3)

Is it standard procedure with postgres to include any fields listed in WHERE
in the ORDER BY, and create a single index for only the ORDER BY fields (in
order of appearance, of course)?

>
> If you do this enough to justify a specialized index, I would suggest a
> two-column index on (userid, timestamp). You will also need to tweak
> the query, because the planner is not quite smart enough to deduce that
> such an index is applicable to the given sort order:
> SELECT ... WHERE userid = 42 ORDER BY userid DESC, timestamp DESC
> LIMIT 5;
> This should generate an index-scan-backwards plan that will execute nigh
> instantaneously, because it will only fetch the rows you really want.
>
> You might or might not be able to drop the separate indexes on userid
> and timestamp, depending on what other queries you might have that need
> them. But you should be paying attention to what plans you are really
> getting (see EXPLAIN) rather than just assuming that some indexes chosen
> at random will do what you need.
>
> regards, tom lane
>

We do many varied queries on nearly every table - our data is highly
relational, and we have a lot of indexes. I thought the planner would pick
up the right index via constraints and not require them in ORDER BY...
EXPLAIN ANALYZE says that the indexes are being used, ala:

gf=# EXPLAIN ANALYZE SELECT id FROM comments WHERE userid = 51 ORDER BY
timestamp DESC LIMIT 5;
QUERY PLAN
----------------------------------------------------------------------------
-------------------------------------------------------------------
Limit (cost=1608.43..1608.45 rows=5 width=8) (actual time=0.292..0.317
rows=5 loops=1)
-> Sort (cost=1608.43..1609.45 rows=407 width=8) (actual
time=0.287..0.295 rows=5 loops=1)
Sort Key: "timestamp"
-> Index Scan using comments_ix_userid on comments
(cost=0.00..1590.79 rows=407 width=8) (actual time=0.031..0.190 rows=35
loops=1)
Index Cond: (userid = 51)
Total runtime: 0.375 ms
(6 rows)

Is this the wrong procedure? Your suggested syntax seems much more
efficient, but I don't quite understand exactly why, as PG is using our
existing indexes...

gf=# EXPLAIN ANALYZE SELECT id FROM comments WHERE userid = 51 ORDER BY
userid DESC, timestamp DESC LIMIT 5;

QUERY PLAN
----------------------------------------------------------------------------
----------------------------------------------------------------------------
----
Limit (cost=0.00..19.90 rows=5 width=12) (actual time=0.040..0.076 rows=5
loops=1)
-> Index Scan Backward using comments_ix_userid_timestamp on comments
(cost=0.00..1620.25 rows=407 width=12) (actual time=0.035..0.054 rows=5
loops=1)
Index Cond: (userid = 51)
Total runtime: 0.134 ms
(4 rows)

Note: This was done after adding an index on comments (userid, timestamp)

Regards,

Jason

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2004-08-12 02:09:14 Re: Hardware upgrade for a high-traffic database
Previous Message Josh Berkus 2004-08-12 00:25:55 Re: Buld Insert and Index use.