Re: planner with index scan cost way off actual cost, advices to tweak cost constants?

From: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To: Guillaume Cottenceau <gc(at)mnc(dot)ch>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: planner with index scan cost way off actual cost, advices to tweak cost constants?
Date: 2006-03-21 11:58:35
Message-ID: 20060321115835.GN15742@pervasive.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Tue, Mar 21, 2006 at 11:13:06AM +0100, Guillaume Cottenceau wrote:
> "Jim C. Nasby" <jnasby 'at' pervasive.com> writes:
>
> [...]
>
> > > My point is that the planner's cost estimate is way above the
> > > actual cost of the query, so the planner doesn't use the best
> > > plan. Even if the index returns so much rows, actual cost of the
> > > query is so that index scan (worst case, all disk cache flushed)
> > > is still better than seq scan but the planner uses seq scan.
> >
> > Yes. The cost estimator for an index scan supposedly does a linear
> > interpolation between a minimum cost and a maximum cost depending on the
> > correlation of the first field in the index. The problem is that while
> > the comment states it's a linear interpolation, the actual formula
> > squares the correlation before interpolating. This means that unless the
> > correlation is very high, you're going to get an unrealistically high
> > cost for an index scan. I have data that supports this at
> > http://stats.distributed.net/~decibel/, but I've never been able to get
> > around to testing a patch to see if it improves things.
>
> Interesting.
>
> It would be nice to investigate the arguments behind the choice
> you describe for the formula used to perform the interpolation. I
> have absolutely no knowledge on pg internals so this is rather
> new/fresh for me, I have no idea how smart that choice is (but
> based on my general feeling about pg, I'm suspecting this is
> actually smart but I am not smart enough to see why ;p).

If you feel like running some tests, you need to change

run_cost += max_IO_cost + csquared * (min_IO_cost - max_IO_cost);

in src/backend/optimizer/path/costsize.c to something like

run_cost += max_IO_cost + abs(indexCorrelation) * (min_IO_cost - max_IO_cost);

That might not produce a perfect cost estimate, but I'll wager that it
will be substantially better than what's in there now. FYI, see also
http://archives.postgresql.org/pgsql-performance/2005-04/msg00669.php
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Csaba Nagy 2006-03-21 11:59:13 Re: Migration study, step 1: bulk write performance
Previous Message Steinar H. Gunderson 2006-03-21 11:56:18 Re: Migration study, step 1: bulk write performance