Quick Links

Re: Improving N-Distinct estimation by ANALYZE

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Improving N-Distinct estimation by ANALYZE
Date:	2006-01-14 01:19:05
Message-ID:	200601131719.05197.josh@agliodbs.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Simon,

> It's also worth mentioning that for datatypes that only have an "="
> operator the performance of compute_minimal_stats is O(N^2) when values
> are unique, so increasing sample size is a very bad idea in that case.
> It may be possible to re-sample the sample, so that we get only one row
> per block as with the current row sampling method. Another idea might be
> just to abort the analysis when it looks fairly unique, rather than
> churn through the whole sample.

I'd tend to do the latter. If we haven't had a value repeat in 25 blocks,
how likely is one to appear later?

Hmmm ... does ANALYZE check for UNIQUE constraints? Most unique values
are going to have a constraint, in which case we don't need to sample them
at all for N-distinct.

--
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

In response to

Re: Improving N-Distinct estimation by ANALYZE at 2006-01-13 19:18:29 from Simon Riggs

Responses

Re: Improving N-Distinct estimation by ANALYZE at 2006-01-14 04:37:38 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2006-01-14 04:37:38	Re: Improving N-Distinct estimation by ANALYZE
Previous Message	Jonah H. Harris	2006-01-13 22:17:04	Re: simple utility commands (src/backend/commands)