Re: merge>hash>loop

From: Markus Schaber <schabi(at)logix-tt(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: merge>hash>loop
Date: 2006-04-18 10:51:59
Message-ID: 4444C4CF.2000106@logix-tt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi, Tom,

Tom Lane wrote:

> Well, the other thing that's going on here is that we know we are
> overestimating the cost of nestloop-with-inner-indexscan plans.
> The current estimation for that is basically "outer scan cost plus N
> times inner scan cost" where N is the estimated number of outer tuples;
> in other words the repeated indexscan probes are each assumed to happen
> from a cold start. In reality, caching of the upper levels of the index
> means that the later index probes are much cheaper than this model
> thinks. We've known about this for some time but no one's yet proposed
> a more reasonable cost model.

My spontaneus guess would be to use log(N)*inner instead of N*inner. I
don't have any backings for that, it's just what my intuition tells me
as a first shot.

> In my mind this is tied into another issue, which is that the planner
> always costs on the basis of each query starting from zero. In a real
> environment it's much cheaper to use heavily-used indexes than this cost
> model suggests, because they'll already be swapped in due to use by
> previous queries. But we haven't got any infrastructure to keep track
> of what's been heavily used, let alone a cost model that could make use
> of the info.

An easy first approach would be to add a user tunable cache probability
value to each index (and possibly table) between 0 and 1. Then simply
multiply random_page_cost with (1-that value) for each scan.

Later, this value could be automatically tuned by stats analysis or
other means.

> I think part of the reason that people commonly reduce random_page_cost
> to values much lower than physical reality would suggest is that it
> provides a crude way of partially compensating for this basic problem.

I totall agree with this, it's just what we did here from time to time. :-)

Hmm, how does effective_cach_size correspond with it? Shouldn't a high
effective_cache_size have a similar effect?

Thanks,
Markus
--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Magnus Hagander 2006-04-18 11:56:44 Re: Inserts optimization?
Previous Message Markus Schaber 2006-04-18 09:02:34 Re: Inserts optimization?