Quick Links

Re: ANALYZE to be ignored by VACUUM

From:	Gregory Stark <stark(at)enterprisedb(dot)com>
To:	"ITAGAKI Takahiro" <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ANALYZE to be ignored by VACUUM
Date:	2008-02-19 08:56:04
Message-ID:	87zltxz5nf.fsf@oxford.xeocode.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

"ITAGAKI Takahiro" <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp> writes:

> 4. ANALYZE finishes in a short time.
> It is ok that VACUUM takes a long time because it is not a transaction,
> but ANALYZE should not. It requres cleverer statistics algorithm.
> Sampling factor 10 is not enough for pg_stats.n_distinct. We seems to
> estimate n_distinct too low for clustered (ordered) tables.

Unfortunately no constant size sample is going to be enough for reliable
n_distinct estimates. To estimate n_distinct you really have to see a
percentage of the table, and to get good estimates that percentage has to be
fairly large.

There was a paper with a nice algorithm posted a while back which required
only constant memory but it depended on scanning the entire table. I think to
do n_distinct estimates we'll need some statistics which are either gathered
opportunistically whenever a seqscan happens or maintained by an index.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's Slony Replication support!

In response to

Re: ANALYZE to be ignored by VACUUM at 2008-02-19 07:31:20 from ITAGAKI Takahiro

Responses

Re: ANALYZE to be ignored by VACUUM at 2008-02-20 04:17:45 from ITAGAKI Takahiro

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Peter Eisentraut	2008-02-19 09:46:54	Re: Severe regression in autoconf 2.61
Previous Message	Tatsuo Ishii	2008-02-19 08:36:00	RFP: Recursive query in 8.4