Re: reducing random_page_cost from 4 to 2 to force index scan

From: Sok Ann Yap <sokann(at)gmail(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: reducing random_page_cost from 4 to 2 to force index scan
Date: 2011-04-28 00:19:01
Message-ID: BANLkTi=u9HeA3odW7Vs8z_bBWacZrqVF0Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Thu, Apr 28, 2011 at 7:23 AM, Kevin Grittner
<Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
> Sok Ann Yap <sokann(at)gmail(dot)com> wrote:
>
>> Anyway, the overhead of spawning 44 extra queries means that it is
>> still better off for me to stick with the original query and tune
>> PostgreSQL to choose index scan.
>
> Maybe, but what is *best* for you is to tune PostgreSQL so that your
> costs are accurately modeled, at which point it will automatically
> pick the best plan for most or all of your queries without you
> needing to worry about it.
>
> If you set your effective_cache_size to the sum of shared_buffers
> and what your OS reports as cache after you've been running a while,
> that will help the optimizer know what size index fits in RAM, and
> will tend to encourage index use.  If the active portion of your
> data is heavily cached, you might want to set random_page_cost and
> seq_page_cost to the same value, and make that value somewhere in
> the 0.1 to 0.05 range.  If you have moderate caching, using 1 and 2
> can be good.
>
> If you're still not getting reasonable plans, please post again with
> more information about your hardware along with the query and its
> EXPLAIN ANALYZE output.
>
> -Kevin
>

I understand the need to tune PostgreSQL properly for my use case.
What I am curious about is, for the data set I have, under what
circumstances (hardware/workload/cache status/etc) would a sequential
scan really be faster than an index scan for that particular query?

To simulate a scenario when nothing is cached, I stopped PostgreSQL,
dropped all system cache (sync; echo 3 > /proc/sys/vm/drop_caches),
restarted PostgreSQL, and ran the query. A sequential scan run took
13.70 seconds, while an index scan run took 0.34 seconds, which is
still 40 times faster.

Also, I tried increasing effective_cache_size from 512MB to 3GB (the
database size is 2+GB), and it still favor sequential scan. The
estimated costs did not change at all.

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Greg Smith 2011-04-28 02:18:37 Re: index usage on queries on inherited tables
Previous Message Samuel Gendler 2011-04-28 00:18:17 Re: index usage on queries on inherited tables