Quick Links

Re: estimating # of distinct values

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	tv(at)fuzzy(dot)cz
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: estimating # of distinct values
Date:	2010-12-29 23:47:56
Message-ID:	4D1BC8AC.9030907@agliodbs.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

> Well, but that's not 7%, thats 7x! And the theorem says 'greater or equal'
> so this is actually the minimum - you can get a much bigger difference
> with lower probability. So you can easily get an estimate that is a few
> orders off.

FWIW, based on query performance, estimates which are up to 5X off are
tolerable, and anything within 3X is considered "accurate". Above 5X
the probability of bad query plans becomes problematically high.

Of course, if you're doing cross-column stats, the accuracy of each
individual column becomes critical since estimation error could be
combiniational in the worst case (i.e. if colA is 3X and colB is 0.3X
then colA<->colB will be 9X off).

Anyway, I look forward to your experiments with stream-based estimators.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

In response to

Re: estimating # of distinct values at 2010-12-28 15:55:12 from tv

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Noah Misch	2010-12-29 23:52:42	Re: Avoiding rewrite in ALTER TABLE ALTER TYPE
Previous Message	Kevin Grittner	2010-12-29 23:46:55	SLRU API tweak