Re: benchmarking the query planner

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Nathan Boley" <npboley(at)gmail(dot)com>
Cc: "Vladimir Sitnikov" <sitnikov(dot)vladimir(at)gmail(dot)com>, "Robert Haas" <robertmhaas(at)gmail(dot)com>, "Gregory Stark" <stark(at)enterprisedb(dot)com>, "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>, "jd(at)commandprompt(dot)com" <jd(at)commandprompt(dot)com>, "Josh Berkus" <josh(at)agliodbs(dot)com>, "Greg Smith" <gsmith(at)gregsmith(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: benchmarking the query planner
Date: 2008-12-12 01:50:10
Message-ID: 15086.1229046610@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"Nathan Boley" <npboley(at)gmail(dot)com> writes:
> Isn't a selectivity estimate of x = v as ( the number of values in v's
> histogram bucket ) / ( number of distinct values in v's histogram
> bucket ) pretty rational? Thats currently what we do for non-mcv
> values, except that we look at ndistinct over the whole table instead
> of individual histogram buckets.

But the histogram buckets are (meant to be) equal-population, so it
should come out the same either way. The removal of MCVs from the
population will knock any possible variance in ndistinct down to the
point where I seriously doubt that this could offer a win. An even
bigger problem is that this requires estimation of ndistinct among
fractions of the population, which will be proportionally less accurate
than the overall estimate. Accurate ndistinct estimation is *hard*.

> now, if there are 100 histogram buckets then any values that occupy
> more than 1% of the table will be mcv's regardless - why force a value
> to be an mcv if it only occupies 0.1% of the table?

Didn't you just contradict yourself? The cutoff would be 1% not 0.1%.
In any case there's already a heuristic to cut off the MCV list at some
shorter length (ie, more than 1% in this example) if it seems not
worthwhile to keep the last entries. See lines 2132ff (in CVS HEAD)
in analyze.c.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2008-12-12 01:52:00 Re: benchmarking the query planner
Previous Message Nathan Boley 2008-12-12 01:35:18 Re: benchmarking the query planner