Skip site navigation (1) Skip section navigation (2)

Re: merge>hash>loop

From: Markus Schaber <schabi(at)logix-tt(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: merge>hash>loop
Date: 2006-04-18 10:51:59
Message-ID: 4444C4CF.2000106@logix-tt.com (view raw or flat)
Thread:
Lists: pgsql-performance
Hi, Tom,

Tom Lane wrote:

> Well, the other thing that's going on here is that we know we are
> overestimating the cost of nestloop-with-inner-indexscan plans.
> The current estimation for that is basically "outer scan cost plus N
> times inner scan cost" where N is the estimated number of outer tuples;
> in other words the repeated indexscan probes are each assumed to happen
> from a cold start.  In reality, caching of the upper levels of the index
> means that the later index probes are much cheaper than this model
> thinks.  We've known about this for some time but no one's yet proposed
> a more reasonable cost model.

My spontaneus guess would be to use log(N)*inner instead of N*inner. I
don't have any backings for that, it's just what my intuition tells me
as a first shot.

> In my mind this is tied into another issue, which is that the planner
> always costs on the basis of each query starting from zero.  In a real
> environment it's much cheaper to use heavily-used indexes than this cost
> model suggests, because they'll already be swapped in due to use by
> previous queries.  But we haven't got any infrastructure to keep track
> of what's been heavily used, let alone a cost model that could make use
> of the info.

An easy first approach would be to add a user tunable cache probability
value to each index (and possibly table) between 0 and 1. Then simply
multiply random_page_cost with (1-that value) for each scan.

Later, this value could be automatically tuned by stats analysis or
other means.

> I think part of the reason that people commonly reduce random_page_cost
> to values much lower than physical reality would suggest is that it
> provides a crude way of partially compensating for this basic problem.

I totall agree with this, it's just what we did here from time to time. :-)

Hmm, how does effective_cach_size correspond with it? Shouldn't a high
effective_cache_size have a similar effect?

Thanks,
Markus
-- 
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf.     | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

In response to

Responses

pgsql-performance by date

Next:From: Magnus HaganderDate: 2006-04-18 11:56:44
Subject: Re: Inserts optimization?
Previous:From: Markus SchaberDate: 2006-04-18 09:02:34
Subject: Re: Inserts optimization?

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group