Skip site navigation (1) Skip section navigation (2)

Re: More thoughts about planner's cost estimates

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: More thoughts about planner's cost estimates
Date: 2006-06-01 19:15:09
Message-ID: 14438.1149189309@sss.pgh.pa.us (view raw or flat)
Thread:
Lists: pgsql-hackers
Josh Berkus <josh(at)agliodbs(dot)com> writes:
> Yeah.  I've refrained from proposing changes because it's a 
> pick-up-sticks.  If we start modifying the model, we need to fix 
> *everything*, not just one item.  And then educate our users that they 
> need to use the GUC variables in a different way.  Here's the issues I see:

> [ various deficiencies in our statistics gathering ]

Those are all valid points, but pretty much orthogonal to what I'm on
about today.  The problems I'm thinking about are seen even when the
planner's rowcount estimates are dead on (as indeed they mostly were
in Philippe's example).

> 5. random_page_cost (as previously discussed) is actually a funciton of 
> relatively immutable hardware statistics, and as such should not need to 
> exist as a GUC once the cost model is fixed.

Well, it'll still exist as a GUC for the same reasons the other ones are
GUCs.  But I think the main reasons people have needed to twiddle it are
	(1) their database is completely RAM-resident (where RPC
	    *should* logically be 1.0), or
	(2) they're trying to compensate for the overestimation of
	    nestloop indexscans.
The changes I'm proposing should help with (2).

> For btrees, we should be able to accurately estimate the cost of the 
> index access given:
> a) the depth of the tree;

Although logically the tree descent *ought* to be a factor, I haven't
seen any evidence that we need to take it into account; in fact all the
evidence points to the conclusion that we're better off ignoring it.
My theory about why is that the upper levels of the tree stay in cache.
I have to admit that's just handwaving, but counting additional disk
hits to fetch the first index tuple is clearly not the direction the
cost model needs to go in.  Look again at the examples in Philippe's
output: an indexscan fetching 1700 tuples (about 5 leaf pages probably)
took 32x longer than one fetching 7 tuples (presumably all on the same
leaf page).  There's no way that a model that counts an additional
couple of page fetches for tree descent is going to arrive at those
numbers.  And we see this over and over again: incredibly small times
to fetch a single index tuple, at least on the average when the index
is being hit many times in one query.

> c) whether the index and tables are already in shared_mem, or else (d) 
> and (e) below:
> d) the probability that the index pages are cached in memory, which is 
> composed of:
> 	(1) the frequency with which that index is accessed, modified by
> 	(2) whether the index is a fraction of available ram, or larger than ram
> e) the probability that the relevant table pages are cached in memory, 
> determined by the same two factors.

These would all be nice things to know, but I'm afraid it's pie in the
sky.  We have no reasonable way to get those numbers.  (And if we could
get them, there would be another set of problems, namely plan instability:
the planner's choices would become very difficult to reproduce.)

>> The big difficulty in modeling cache effects from multiple scans is that
>> we usually don't know how many times the index will be scanned.

> I think we can work around this by figuring out whether the index has 
> already been scanned in the plan, something we *can* know.  So the first 
> scan will be full cost and remaining scans will be fractional cost. 

Uh, no, because what we're normally thinking about is independent probes
into the index with different keys.  For a small number of probes you
have to figure that those all hit different leaf pages and aren't going
to amortize well.  As the number of probes goes up, you can start
charging fractional costs because it's more likely you're hitting a leaf
page you hit recently.  The Mackert/Lohman equations do this reasonably
well --- it's possible we can find something better, but those seem like
a good place to start.  The immediate problem is to get an
infrastructure in place that gives us some numbers to apply equations to.

			regards, tom lane

In response to

Responses

pgsql-hackers by date

Next:From: Martijn van OosterhoutDate: 2006-06-01 20:45:39
Subject: Re: Generalized concept of modules
Previous:From: Greg StarkDate: 2006-06-01 19:14:17
Subject: Re: More thoughts about planner's cost estimates

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group