Re: Tuning planner cost estimates

From: "Jim C(dot) Nasby" <decibel(at)decibel(dot)org>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Tuning planner cost estimates
Date: 2005-05-20 22:49:17
Message-ID: 20050520224917.GN44623@decibel.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Fri, May 20, 2005 at 03:23:16PM -0700, Josh Berkus wrote:
> Jim,
>
> > Well, that raises an interesting issue, because AFAIK none of the cost
> > estimate functions currently do that. Heck, AFAIK even the piggyback
> > seqscan code doesn't take other seqscans into account.
>
> Sure. But you're striving for greater accuracy, no?
>
> Actually, all that's really needed in the way of concurrent activity is a
> calculated factor that lets us know how likely a particular object is to be
> cached, either in the fs cache or the pg cache (with different factors for
> each presumably) based on history. Right now, that's based on
> estimated_cache_size, which is rather innacurate: a table which is queried
> once a month has the exact same cost factors as one which is queried every
> 2.1 seconds. This would mean an extra column in pg_stats I suppose.

True, though that's a somewhat different issue that what the load on the
box is (see the reply I just posted). Load on the box (particuarly IO
load) will also play a factor for things; for example, it probably means
seqscans end up costing a lot more than random IO does, because the disk
heads are being sent all over the place anyway.

> > But ultimately, I'm not sure if this is really required or not, because
> > I don't see that we need to use explain when running queries. In fact,
> > it's possibly desireable that we don't, because of the overhead it
> > incurs. We would want to log an explain (maybe analyze) just to make
> > sure we knew what the optimizer was doing, but I think we shouldn't need
> > the info to produce cost estimates.
>
> Well, the problem is that you need to know how much time the index scan took
> vs. other query steps. I don't see a way to do this other than an anayze.

True, but that can be done by a seperate seqscan step. I would argue
that doing it that way is actually more accurate, because the overhead
of explain analyze is huge and tends to swamp other factors out. As I
mentioned in my other email, my tests show explain analyze select * from
table is 5x slower than select count(*) from table.
--
Jim C. Nasby, Database Consultant decibel(at)decibel(dot)org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2005-05-20 23:06:51 Re: Tuning planner cost estimates
Previous Message Jim C. Nasby 2005-05-20 22:40:55 Re: Tuning planner cost estimates