Quick Links

Re: More thoughts about planner's cost estimates

From:	Greg Stark <gsstark(at)mit(dot)edu>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Greg Stark <gsstark(at)mit(dot)edu>, josh(at)agliodbs(dot)com, David Fetter <david(at)fetter(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: More thoughts about planner's cost estimates
Date:	2006-06-02 21:41:07
Message-ID:	87k67zdymk.fsf@stark.xeocode.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> Greg Stark <gsstark(at)mit(dot)edu> writes:
> > And a 5% sample is a pretty big. In fact my tests earlier showed the i/o from
> > 5% block sampling took just as long as reading all the blocks. Even if we
> > figure out what's causing that (IMHO surprising) result and improve matters I
> > would only expect it to be 3-4x faster than a full scan.
>
> One way to reduce the I/O pain from extensive sampling would be to turn
> VACUUM ANALYZE into a genuine combined operation instead of a mere
> notational shorthand for two separate scans.

Except that we're also looking for every way we can to avoid having vacuum
have to touch every page so large tables with narrow hot spots can still
afford to be vacuumed regularly during peak hours.

But for most users analyze doesn't really have to run as often as vacuum. One
sequential scan per night doesn't seem like that big a deal to me.

> I'd still be worried about the CPU pain though. ANALYZE can afford to
> expend a pretty fair number of cycles per sampled tuple, but with a
> whole-table sample that's going to add up.

That is a concern. Though the algorithm is pretty efficient, it basically
amounts to hashing all the tuples keeping only a limited number of hash
buckets and just throwing away the rest of the data.

Things could get more complicated if we get to more complex stats than simple
n_distinct estimates and performing dependence tests on the data for
multi-column statistics.

--
greg

In response to

Re: More thoughts about planner's cost estimates at 2006-06-02 20:43:36 from Tom Lane

Responses

Re: More thoughts about planner's cost estimates at 2006-06-02 22:15:43 from Josh Berkus

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Greg Stark	2006-06-02 21:46:41	Re: COPY (query) TO file
Previous Message	David Fetter	2006-06-02 21:34:48	Re: COPY (query) TO file