Re: Statistics and selectivity estimation for ranges

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Statistics and selectivity estimation for ranges
Date: 2012-08-21 13:25:02
Message-ID: 50338C2E.5060908@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 20.08.2012 00:31, Alexander Korotkov wrote:
> On Thu, Aug 16, 2012 at 4:40 PM, Heikki Linnakangas<
> heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>
>> On 15.08.2012 11:34, Alexander Korotkov wrote:
>>
>>> Ok, we've to decide if we need "standard" histogram. In some cases it can
>>> be used for more accurate estimation of< and> operators.
>>> But I think it is not so important. So, we can replace "standard"
>>> histogram
>>> with histograms of lower and upper bounds?
>>
>> Yeah, I think that makes more sense. The lower bound histogram is still
>> useful for< and> operators, just not as accurate if there are lots of
>> values with the same lower bound but different upper bound.
>
> New version of patch.
> * Collect new stakind STATISTIC_KIND_BOUNDS_HISTOGRAM, which is lower and
> upper bounds histograms combined into single ranges array, instead
> of STATISTIC_KIND_HISTOGRAM.

Ah, that's an interesting approach. So essentially, the histogram looks
just like a normal STATISTIC_KIND_HISTOGRAM histogram, but the values
stored in it are not picked the usual way. The usual way would be to
pick N evenly-spaced values from the column, and store those. Instead,
you pick N evenly-spaced lower bounds, and N evenly-spaced upper bounds,
and construct N range values from those. Looking at a single value in
the histogram, its lower bound comes from a different row than its upper
bound.

That's pretty clever - the histogram has a shape and order that's
compatible with a histogram you'd get with the standard scalar
typanalyze function. In fact, I think you could just let the standard
scalar estimators for < and > to use that histogram as is. Perhaps we
should use STATISTIC_KIND_HISTOGRAM for this after all...

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Sachin Srivastava 2012-08-21 13:34:40 PostgreSQL 9.2beta4 (& git HEAD) server crash on creating extension plpython3u
Previous Message Michael Paquier 2012-08-21 12:13:28 Re: multi-master pgbench?