Quick Links

Re: Improving N-Distinct estimation by ANALYZE

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: Improving N-Distinct estimation by ANALYZE
Date:	2006-01-04 23:25:54
Message-ID:	200601041525.55084.josh@agliodbs.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Tom,

> In general, estimating n-distinct from a sample is just plain a hard
> problem, and it's probably foolish to suppose we'll ever be able to
> do it robustly. What we need is to minimize the impact when we get
> it wrong.

Well, I think it's pretty well proven that to be accurate at all you need
to be able to sample at least 5%, even if some users choose to sample
less. Also I don't think anyone on this list disputes that the current
algorithm is very inaccurate for large tables. Or do they?

While I don't think that we can estimate N-distinct completely accurately,
I do think that we can get within +/- 5x for 80-90% of all cases, instead
of 40-50% of cases like now. We can't be perfectly accurate, but we can
be *more* accurate.

--
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

In response to

Re: Improving N-Distinct estimation by ANALYZE at 2006-01-04 19:49:16 from Tom Lane

Responses

Re: Improving N-Distinct estimation by ANALYZE at 2006-01-05 00:22:05 from Greg Stark

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Jim C. Nasby	2006-01-04 23:57:49	Re: Improving N-Distinct estimation by ANALYZE
Previous Message	Tom Lane	2006-01-04 23:22:59	back-patching locale environment fix