Re: reducing random_page_cost from 4 to 2 to force index scan

From: Sok Ann Yap <sokann(at)gmail(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: reducing random_page_cost from 4 to 2 to force index scan
Date: 2011-04-27 22:34:28
Message-ID: BANLkTi=NFNTZ8n4zB0L+TLdurnSnnpx9Pw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Wed, Apr 27, 2011 at 8:40 PM, Kevin Grittner
<Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
> Sok Ann Yap  wrote:
>> Kevin Grittner  wrote:
>
>>> Please show us your overall configuration and give a description
>>> of the hardware (how many of what kind of cores, how much RAM,
>>> what sort of storage system).
>
>> Here's the configuration (this is just a low end laptop):
>
>> version | PostgreSQL 9.0.4 on x86_64-pc-linux-gnu,
>> compiled by GCC x86_64-pc-linux-gnu-gcc (Gentoo 4.5.2 p1.0,
>> pie-0.4.5) 4.5.2, 64-bit
>> checkpoint_segments | 16
>> default_statistics_target | 10000
>
> Usually overkill.  If this didn't help, you should probably change it
> back.
>
>> effective_cache_size | 512MB
>> lc_collate | en_US.UTF-8
>> lc_ctype | en_US.UTF-8
>> listen_addresses | *
>> log_destination | syslog
>> log_min_duration_statement | 0
>> maintenance_work_mem | 256MB
>> max_connections | 100
>
> You probably don't need this many connections.
>
>> max_stack_depth | 2MB
>> port | 5432
>> random_page_cost | 4
>> server_encoding | UTF8
>> shared_buffers | 256MB
>> silent_mode | on
>> TimeZone | Asia/Kuala_Lumpur
>> wal_buffers | 1MB
>> work_mem | 32MB
>> (20 rows)
>
> It's hard to recommend other changes without knowing the RAM on the
> system.  How many of what kind of CPUs would help, too.
>
>> The thing is, the query I posted was fairly simple (I think), and
>> PostgreSQL should be able to choose the 3000+ times faster index
>> scan with the default random_page_cost of 4.
>
> It picks the plan with the lowest estimated cost.  If it's not
> picking the best plan, that's usually an indication that you need to
> adjust cost factors so that estimates better model the actual costs.
>
>> If I need to reduce it to 2 when using a 5.4k rpm slow disk, what
>> is random_page_cost = 4 good for?
>
> It's good for large databases with a lot of physical disk I/O.  In
> fact, in some of those cases, it needs to be higher.  In your test,
> the numbers indicate that everything was cached in RAM.  That makes
> the effective cost very low.
>
> Also, the odds are that you have more total cache space between the
> shared_buffers and the OS cache than the effective_cache_size
> setting, so the optimizer doesn't expect the number of cache hits
> you're getting on index usage.
>
> -Kevin
>

Thanks for the tips and explanation. I wrongly assumed the
random_page_cost value is independent from caching.

Now, let's go back to the original query:

SELECT
salutations.id,
salutations.name,
EXISTS (
SELECT 1
FROM contacts
WHERE salutations.id = contacts.salutation_id
) AS in_use
FROM salutations

If I split up the query, i.e. running this once:

SELECT
salutations.id,
salutations.name
FROM salutations

and then running this 44 times, once for each row:

SELECT
EXISTS (
SELECT 1
FROM contacts
WHERE contacts.salutation_id = ?
) AS in_use

I can see that PostgreSQL will smartly pick the best plan, i.e. for
common salutations (Madam, Ms, etc), it will do sequential scan, while
for salutations that are rarely used or not used at all, it will do
index scan.

Anyway, the overhead of spawning 44 extra queries means that it is
still better off for me to stick with the original query and tune
PostgreSQL to choose index scan.

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Kevin Grittner 2011-04-27 23:23:36 Re: reducing random_page_cost from 4 to 2 to force index scan
Previous Message Greg Smith 2011-04-27 21:55:36 Re: Performance