Skip site navigation (1) Skip section navigation (2)

Re: reducing random_page_cost from 4 to 2 to force index scan

From: Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: reducing random_page_cost from 4 to 2 to force index scan
Date: 2011-05-13 20:51:08
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-performance
2011/5/13 Josh Berkus <josh(at)agliodbs(dot)com>:
>> I guess maybe the reason why it didn't matter for the OP is that - if
>> the size of the index page in pages is smaller than the pro-rated
>> fraction of effective_cache_size allowed to the index - then the exact
>> value doesn't affect the answer.
>> I apparently need to study this code more.
> FWIW: random_page_cost is meant to be the ratio between the cost of
> looking up a single row as and index lookup, and the cost of looking up
> that same row as part of a larger sequential scan.  For specific
> storage, that coefficient should be roughly the same regardless of the
> table size.  So if your plan for optimization involves manipulating RPC
> for anything other than a change of storage, you're Doing It Wrong.
> Instead, we should be fixing the formulas these are based on and leaving
> RPC alone.
> For any data page, there are actually four costs associated with each
> tuple lookup, per:
> in-memory/seq   | on disk/seq
> ----------------+----------------
> in-memory/random| on disk/random

it lacks some more theorical like sort_page/temp_page : those are
based on a ratio of seq_page_cost and random_page_cost or a simple
seq_page_cost (when working out of work_mem)

memory access is accounted with some 0.1 in some place AFAIR.
(and memory random/seq is the same at the level of estimations we do)

> (yes, there's actually more for bitmapscan etc.  but the example holds)

(if I read correctly the sources, for this one there is a linear
approach to ponderate the cost between random_page cost and
seq_page_cost on the heap page fetch plus the Mackert and Lohman
formula, if needed, in its best usage : predicting what should be in
cache *because* of the current query execution, not because of the
current status of the page cache)

> For any given tuple lookup, then, you can assign a cost based on where
> you think that tuple falls in that quadrant map.  Since this is all
> probability-based, you'd be assigning a cost as a mixed % of in-memory
> and on-disk costs.  Improvements in accuracy of this formula would come
> through improvements in accuracy in predicting if a particular data page
> will be in memory.
> This is what the combination of random_page_cost and
> effective_cache_size ought to supply, but I don't think it does, quite.
> --
> Josh Berkus
> PostgreSQL Experts Inc.
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:

Cédric Villemain               2ndQuadrant     PostgreSQL : Expertise, Formation et Support

In response to

pgsql-performance by date

Next:From: John RouillardDate: 2011-05-13 21:09:41
Subject: Using pgiosim realistically
Previous:From: Josh BerkusDate: 2011-05-13 20:13:41
Subject: Re: reducing random_page_cost from 4 to 2 to force index scan

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group