Quick Links

Re: estimating # of distinct values

From:	tv(at)fuzzy(dot)cz
To:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	"Josh Berkus" <josh(at)agliodbs(dot)com>, "Tomas Vondra" <tv(at)fuzzy(dot)cz>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: estimating # of distinct values
Date:	2010-12-28 15:55:12
Message-ID:	d7cb4a682509456a0a51a994cfdc138c.squirrel@sq.gransy.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

> <tv(at)fuzzy(dot)cz> wrote:
>
>> So even with 10% of the table, there's a 10% probability to get an
>> estimate that's 7x overestimated or underestimated. With lower
>> probability the interval is much wider.
>
> Hmmm... Currently I generally feel I'm doing OK when the estimated
> rows for a step are in the right order of magnitude -- a 7% error
> would be a big improvement in most cases. Let's not lose track of
> the fact that these estimates are useful even when they are not
> dead-on accurate.

Well, but that's not 7%, thats 7x! And the theorem says 'greater or equal'
so this is actually the minimum - you can get a much bigger difference
with lower probability. So you can easily get an estimate that is a few
orders off.

Anyway I really don't want precise values, just a reasonable estimate. As
I said, we could use the AE estimate they proposed in the paper. It has
the nice feature that it actually reaches the low boundary (thus the
inequality changes to equality). The downside is that there are estimators
with better behavior on some datasets.

Tomas

In response to

Re: estimating # of distinct values at 2010-12-28 14:53:06 from Kevin Grittner

Responses

Re: estimating # of distinct values at 2010-12-29 23:47:56 from Josh Berkus

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Joel Jacobson	2010-12-28 16:00:35	pg_dump --split patch
Previous Message	Robert Haas	2010-12-28 15:50:06	Re: page compression