Re: Query Optimization

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Sean Davis <sdavis2(at)mail(dot)nih(dot)gov>
Cc: Luiz Eduardo Cantanhede Neri <lecneri(at)gmail(dot)com>, Zach Calvert <zachcalvert(at)hemerasoftware(dot)com>, pgsql-novice(at)postgresql(dot)org
Subject: Re: Query Optimization
Date: 2009-05-27 15:34:03
Message-ID: 11053.1243438443@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

Sean Davis <sdavis2(at)mail(dot)nih(dot)gov> writes:
>> zachcalvert(at)hemerasoftware(dot)com> wrote:
>>> I have a query and I have run
>>> explain analyze
>>> select count(*)
>>> from score
>>> where leaderboardid=35 and score <= 6841 and active
>>>
>>> The result is
>>> "Aggregate (cost=2491.06..2491.07 rows=1 width=0) (actual
>>> time=38.878..38.878 rows=1 loops=1)"
>>> " -> Seq Scan on score (cost=0.00..2391.17 rows=39954 width=0)
>>> (actual time=0.012..30.760 rows=38571 loops=1)"
>>> " Filter: (active AND (score <= 6841) AND (leaderboardid = 35))"
>>> "Total runtime: 38.937 ms"
>>>
>>> I have an index on score, I have an index on score, leaderboardid, and
>>> active and still it does a sequential scan.

> Postgresql is aware of the "cost" associated with each query. In the case
> of a small table with an index that is not very discriminative, it may
> choose a sequential scan. However, as you add more rows, the index scan may
> become more effective and may be used instead. One thing to keep in mind is
> that an index scan is NOT always faster than a sequential scan.

A crude rule of thumb is that you need the query to fetch less than
ten percent of the rows before a bitmap scan is going to be a win,
and less than one percent before a plain indexscan is going to be a win.
(If your database is entirely cached in memory then the crossover
percentages are higher, and you need to adjust the planner's cost
parameters so that it gets this right.) It's not clear exactly how
big this table is, but I'm betting the query is fetching more than
ten percent of it.

One point worth making is that if this is the typical set of conditions
in your queries, then the best index would be one on (leaderboardid,
score) not (score, leaderboardid, active). (I'm betting that the
condition active = true is so nonselective it's not worth keeping it in
the index at all.) You want equality conditions on the leading
column(s) and inequalities on the trailing columns. To see why this is,
think about the index sort ordering and the portion of the index that
the query will have to scan. In the latter case the set of index
entries matching this query is a contiguous group; in the former, not.

Our fine manual has a reasonable amount of detail about proper index
design:
http://www.postgresql.org/docs/8.3/static/indexes.html

regards, tom lane

In response to

Responses

Browse pgsql-novice by date

  From Date Subject
Next Message Thomas Kellerer 2009-05-27 16:00:39 Re: Tool for modeling
Previous Message Joshua Tolley 2009-05-27 15:29:51 Re: Postgres registry access using java