Quick Links

Re: merge>hash>loop

From:	Markus Schaber <schabi(at)logix-tt(dot)com>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: merge>hash>loop
Date:	2006-04-18 10:51:59
Message-ID:	4444C4CF.2000106@logix-tt.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Hi, Tom,

Tom Lane wrote:

> Well, the other thing that's going on here is that we know we are
> overestimating the cost of nestloop-with-inner-indexscan plans.
> The current estimation for that is basically "outer scan cost plus N
> times inner scan cost" where N is the estimated number of outer tuples;
> in other words the repeated indexscan probes are each assumed to happen
> from a cold start. In reality, caching of the upper levels of the index
> means that the later index probes are much cheaper than this model
> thinks. We've known about this for some time but no one's yet proposed
> a more reasonable cost model.

My spontaneus guess would be to use log(N)*inner instead of N*inner. I
don't have any backings for that, it's just what my intuition tells me
as a first shot.

> In my mind this is tied into another issue, which is that the planner
> always costs on the basis of each query starting from zero. In a real
> environment it's much cheaper to use heavily-used indexes than this cost
> model suggests, because they'll already be swapped in due to use by
> previous queries. But we haven't got any infrastructure to keep track
> of what's been heavily used, let alone a cost model that could make use
> of the info.

An easy first approach would be to add a user tunable cache probability
value to each index (and possibly table) between 0 and 1. Then simply
multiply random_page_cost with (1-that value) for each scan.

Later, this value could be automatically tuned by stats analysis or
other means.

> I think part of the reason that people commonly reduce random_page_cost
> to values much lower than physical reality would suggest is that it
> provides a crude way of partially compensating for this basic problem.

I totall agree with this, it's just what we did here from time to time. :-)

Hmm, how does effective_cach_size correspond with it? Shouldn't a high
effective_cache_size have a similar effect?

Thanks,
Markus
--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

In response to

Re: merge>hash>loop at 2006-04-14 16:58:49 from Tom Lane

Responses

Re: merge>hash>loop at 2006-04-18 21:52:38 from Jim C. Nasby
Re: merge>hash>loop at 2006-04-18 22:26:48 from Tom Lane
Re: merge>hash>loop at 2006-04-18 23:38:33 from Tom Lane

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Magnus Hagander	2006-04-18 11:56:44	Re: Inserts optimization?
Previous Message	Markus Schaber	2006-04-18 09:02:34	Re: Inserts optimization?