Quick Links

Re: [HACKERS] Bad n_distinct estimation; hacks suggested?

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Mischa Sandberg <mischa(dot)sandberg(at)telus(dot)net>
Cc:	josh(at)agliodbs(dot)com, pgsql-perform <pgsql-performance(at)postgresql(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [HACKERS] Bad n_distinct estimation; hacks suggested?
Date:	2005-04-27 13:43:31
Message-ID:	426F9703.4010108@dunslane.net
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers pgsql-performance

Mischa Sandberg wrote:

>
>Perhaps I can save you some time (yes, I have a degree in Math). If I
>understand correctly, you're trying extrapolate from the correlation
>between a tiny sample and a larger sample. Introducing the tiny sample
>into any decision can only produce a less accurate result than just
>taking the larger sample on its own; GIGO. Whether they are consistent
>with one another has no relationship to whether the larger sample
>correlates with the whole population. You can think of the tiny sample
>like "anecdotal" evidence for wonderdrugs.
>
>
>

Ok, good point.

I'm with Tom though in being very wary of solutions that require even
one-off whole table scans. Maybe we need an additional per-table
statistics setting which could specify the sample size, either as an
absolute number or as a percentage of the table. It certainly seems that
where D/N ~ 0.3, the estimates on very large tables at least are way way
out.

Or maybe we need to support more than one estimation method.

Or both ;-)

cheers

andrew

In response to

Re: [HACKERS] Bad n_distinct estimation; hacks suggested? at 2005-04-27 05:38:04 from Mischa Sandberg

Responses

Re: [HACKERS] Bad n_distinct estimation; hacks suggested? at 2005-04-27 15:25:16 from Josh Berkus
Re: Distinct-Sampling (Gibbons paper) for Postgres at 2005-04-29 05:10:18 from a3a18850

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Brent Verner	2005-04-27 14:13:02	Re: [proposal] protocol extension to support loadable stream filters
Previous Message	Simon Riggs	2005-04-27 13:13:42	Re: possible TODO: read-only tables, select from indexes

Browse pgsql-performance by date

	From	Date	Subject
Next Message	mmiranda	2005-04-27 14:59:41	Re: Final decision
Previous Message	Yann Michel	2005-04-27 13:31:39	Re: What needs to be done for real Partitioning?