Quick Links

Re: Improving N-Distinct estimation by ANALYZE

From:	Greg Stark <gsstark(at)mit(dot)edu>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Improving N-Distinct estimation by ANALYZE
Date:	2006-01-05 15:02:11
Message-ID:	87irsy4t1o.fsf@stark.xeocode.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Josh Berkus <josh(at)agliodbs(dot)com> writes:

> > Only if your sample is random and independent. The existing mechanism tries
> > fairly hard to ensure that every record has an equal chance of being selected.
> > If you read the entire block and not appropriate samples then you'll introduce
> > systematic sampling errors. For example, if you read an entire block you'll be
> > biasing towards smaller records.
>
> Did you read any of the papers on block-based sampling? These sorts of issues
> are specifically addressed in the algorithms.

We *currently* use a block based sampling algorithm that addresses this issue
by taking care to select rows within the selected blocks in an unbiased way.
You were proposing reading *all* the records from the selected blocks, which
throws away that feature.

--
greg

In response to

Re: Improving N-Distinct estimation by ANALYZE at 2006-01-05 06:28:23 from Josh Berkus

Responses

Re: Improving N-Distinct estimation by ANALYZE at 2006-01-05 19:40:19 from Josh Berkus

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Greg Stark	2006-01-05 15:12:29	Re: Improving N-Distinct estimation by ANALYZE
Previous Message	Stephen Frost	2006-01-05 14:41:47	Re: [PATCHES] TRUNCATE, VACUUM, ANALYZE privileges