Skip site navigation (1) Skip section navigation (2)

Re: Optimizer not using index on 120M row table

From: "scott(dot)marlowe" <scott(dot)marlowe(at)ihs(dot)com>
To: "Jim C(dot) Nasby" <jim(at)nasby(dot)net>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: Optimizer not using index on 120M row table
Date: 2003-04-29 15:46:20
Message-ID: Pine.LNX.4.33.0304290926360.15964-100000@css120.ihs.com (view raw or flat)
Thread:
Lists: pgsql-general
There are a few settings that you pretty much have to change on bigger 
hardware to get good performance.  

shared_buffers (1000 to 10000 is a good place to start.) measured in 8k 
blocks.
effective_cache_size (size of OS file system and disk caches measured in 
8k blocks)

The CPU cost settings:

These two all measure the cost of each tuple operation (index or table 
respectively) as a fraction of a sequential scan.   

cpu_index_tuple_cost 
cpu_tuple_cost 

This one measures the cost of each operator in a where clause, again as a 
fraction of a sequential scan.

cpu_operator_cost 

Lastly, the big one:

random_page_cost tells the planner how much a random page fetch costs 
compared to a sequential page cost.  A setting of one would mean that a 
seq scan and an index scan are even.  

Here's the settings off of my box, which runs a dual 10krpm UWSCSI disk 
set on a dual PIII-750 with 1.5 gig ram:

shared_buffers 32768 #default 64
effective_cache_size = 100000  # default 1000
random_page_cost = 1 # default 4
cpu_tuple_cost = 0.01 # default 0.01
cpu_index_tuple_cost = 0.0001 # default 0.001
cpu_operator_cost = 0.0025 # default 0.0025

32768*8192=256M
100000*8192=780M

Note that we originally set random_page_cost to 1 long before I'd realized 
we had our effective cache size set way too low (i.e. default.) and so the 
planner was picking seq scans because it was sure the data weren't in 
memory at the time.  Setting effective cache size to the right setting 
means we could probably go back to a setting of 1.5 to 2.

It seems that when postgresql picks an index scan when it should 
have picked a seq scan, the cost is that you're up to twice as slow as a 
seq scan, but picking a seq scan when it shoulda picked an index can 
result in performance 10 times slower, so we fall on the side of caution 
and favor index scans over seq scans because of this.


In response to

Responses

pgsql-general by date

Next:From: scott.marloweDate: 2003-04-29 15:51:13
Subject: Re: Importing from Access 2000?
Previous:From: Gerhard HintermayerDate: 2003-04-29 15:44:04
Subject: Re: Backend memory leakage when inserting

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group