Quick Links

Re: More thoughts about planner's cost estimates

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	Greg Stark <gsstark(at)mit(dot)edu>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: More thoughts about planner's cost estimates
Date:	2006-06-01 18:32:21
Message-ID:	200606011132.22066.josh@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Greg,

> I'm convinced these two are more connected than you believe.

Actually, I think they are inseparable.

> I might be interested in implementing that algorithm that was posted a
> while back that involved generating good unbiased samples of discrete
> values. The algorithm was quite clever and well described and paper
> claimed impressively good results.
>
> However it will only make sense if people are willing to accept that
> analyze will need a full table scan -- at least for tables where the DBA
> knows that good n_distinct estimates are necessary.

What about block-based sampling? Sampling 1 in 20 disk pages, rather than
1 in 20 rows, should require siginificantly less scanning, and yet give us
a large enough sample for reasonable accuracy.

> > 3. We don't have any method to analyze inter-column correlation within
> > a table;
> >
> > 4. We don't keep statistics on foriegn key correlation;
>
> Gosh these would be nice but they sound like hard problems. Has anybody
> even made any headway in brainstorming how to tackle them?

There's no time like the present!

Actually, these both seem like fairly straightforward problems
storage-wise. The issue is deriving the statistics, for tables with many
columns or FKs.

> > 5. random_page_cost (as previously discussed) is actually a funciton
> > of relatively immutable hardware statistics, and as such should not
> > need to exist as a GUC once the cost model is fixed.
>
> I don't think that's true at all. Not all hardware is the same.
>
> Certainly the need to twiddle this GUC should be drastically reduced if
> the cache effects are modelled properly and the only excuses left are
> legitimate hardware differences.

OK ... but still, it should become a "little knob" rather than the "big
knob" it is currently.

--
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

In response to

Re: More thoughts about planner's cost estimates at 2006-06-01 18:25:56 from Greg Stark

Responses

Re: More thoughts about planner's cost estimates at 2006-06-01 19:14:17 from Greg Stark

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Greg Stark	2006-06-01 19:14:17	Re: More thoughts about planner's cost estimates
Previous Message	Robert Treat	2006-06-01 18:29:47	stable snapshot looks outdated