Re: Improving N-Distinct estimation by ANALYZE

From: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, tshipley(at)deru(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Improving N-Distinct estimation by ANALYZE
Date: 2006-01-05 19:58:18
Message-ID: 20060105195818.GV43311@pervasive.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 05, 2006 at 10:12:29AM -0500, Greg Stark wrote:
> Worse, my recollection from the paper I mentioned earlier was that sampling
> small percentages like 3-5% didn't get you an acceptable accuracy. Before you
> got anything reliable you found you were sampling very large percentages of
> the table. And note that if you have to sample anything over 10-20% you may as
> well just read the whole table. Random access reads are that much slower.

If I'm reading backend/commands/analyze.c right, the heap is accessed
linearly, only reading blocks that get selected but reading them in heap
order, which shouldn't be anywhere near as bad as random access.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Neil Conway 2006-01-05 21:27:58 Re: [HACKERS] Inconsistent syntax in GRANT
Previous Message Josh Berkus 2006-01-05 19:44:24 Re: [HACKERS] Inconsistent syntax in GRANT