Re: Improving N-Distinct estimation by ANALYZE

From: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: josh(at)agliodbs(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Improving N-Distinct estimation by ANALYZE
Date: 2006-01-06 16:09:06
Message-ID: 20060106160906.GH3902@pervasive.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 06, 2006 at 01:24:41AM -0500, Greg Stark wrote:
> > 5% based on block-based sampling is reasonable; that means a straight 5% of
> > the on-disk size of the table, so 5gb for a 100gb table. With random-row
> > sampling, that would require as much as 25% of the table, making it easier
> > to just scan the whole thing.
>
> Postgres's current sample sizes are clearly geared towards the histograms
> where they are entirely realistic. All of the distinct estimates are clearly
> just ad hoc attempts based on the existing sampling.
>
> Is a mechanism that is only 5x faster than reading the whole table (assuming
> random_page_cost of 4) and is off by more than a factor of three 10% of the
> time really worth it?

Before we start debating merits of proposals based on random reads, can
someone confirm that the sampling code actually does read randomly? I
looked at it yesterday; there is a comment that states that blocks to be
scanned are passed to the analyze function in physical order, and AFAICT
the function that chooses blocks does so based strictly on applying a
probability function to block numbers as it increments a counter. It
seems that any reading is actually sequential and not random, which
makes all the random_page_cost hand-waving null and void.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2006-01-06 16:18:15 Re: could not access status of transaction 0
Previous Message Stefan Kaltenbrunner 2006-01-06 16:01:57 could not access status of transaction 0