Re: Optimizer not using index on 120M row table

From: "scott(dot)marlowe" <scott(dot)marlowe(at)ihs(dot)com>
To: "Jim C(dot) Nasby" <jim(at)nasby(dot)net>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: Optimizer not using index on 120M row table
Date: 2003-04-29 15:46:20
Message-ID: Pine.LNX.4.33.0304290926360.15964-100000@css120.ihs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

There are a few settings that you pretty much have to change on bigger
hardware to get good performance.

shared_buffers (1000 to 10000 is a good place to start.) measured in 8k
blocks.
effective_cache_size (size of OS file system and disk caches measured in
8k blocks)

The CPU cost settings:

These two all measure the cost of each tuple operation (index or table
respectively) as a fraction of a sequential scan.

cpu_index_tuple_cost
cpu_tuple_cost

This one measures the cost of each operator in a where clause, again as a
fraction of a sequential scan.

cpu_operator_cost

Lastly, the big one:

random_page_cost tells the planner how much a random page fetch costs
compared to a sequential page cost. A setting of one would mean that a
seq scan and an index scan are even.

Here's the settings off of my box, which runs a dual 10krpm UWSCSI disk
set on a dual PIII-750 with 1.5 gig ram:

shared_buffers 32768 #default 64
effective_cache_size = 100000 # default 1000
random_page_cost = 1 # default 4
cpu_tuple_cost = 0.01 # default 0.01
cpu_index_tuple_cost = 0.0001 # default 0.001
cpu_operator_cost = 0.0025 # default 0.0025

32768*8192=256M
100000*8192=780M

Note that we originally set random_page_cost to 1 long before I'd realized
we had our effective cache size set way too low (i.e. default.) and so the
planner was picking seq scans because it was sure the data weren't in
memory at the time. Setting effective cache size to the right setting
means we could probably go back to a setting of 1.5 to 2.

It seems that when postgresql picks an index scan when it should
have picked a seq scan, the cost is that you're up to twice as slow as a
seq scan, but picking a seq scan when it shoulda picked an index can
result in performance 10 times slower, so we fall on the side of caution
and favor index scans over seq scans because of this.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message scott.marlowe 2003-04-29 15:51:13 Re: Importing from Access 2000?
Previous Message Gerhard Hintermayer 2003-04-29 15:44:04 Re: Backend memory leakage when inserting