Quick Links

Re: estimating # of distinct values

From:	Florian Pflug <fgp(at)phlo(dot)org>
To:	Nathan Boley <npboley(at)gmail(dot)com>
Cc:	Tomas Vondra <tv(at)fuzzy(dot)cz>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: estimating # of distinct values
Date:	2011-01-19 23:56:18
Message-ID:	0ED6A735-4377-47DC-AEF4-C55F54BD06C4@phlo.org
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Jan19, 2011, at 23:44 , Nathan Boley wrote:
> If you think about it, it's a bit ridiculous to look at the whole table
> *just* to "estimate" ndistinct - if we go that far why dont we just
> store the full distribution and be done with it?

The crucial point that you're missing here is that ndistinct provides an
estimate even if you *don't* have a specific value to search for at hand.
This is way more common than you may think, it e.g. happens every you time
PREPARE are statement with parameters. Even knowing the full distribution
has no advantage in this case - the best you could do is to average the
individual probabilities which gives ... well, 1/ndistinct.

best regards,
Florian Pflug

In response to

Re: estimating # of distinct values at 2011-01-19 22:44:59 from Nathan Boley

Responses

Re: estimating # of distinct values at 2011-01-20 01:41:51 from Nathan Boley

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Jan Urbański	2011-01-20 00:26:23	Re: pl/python refactoring
Previous Message	Tomas Vondra	2011-01-19 23:32:37	Re: estimating # of distinct values