Re: Understanding histograms

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: len(at)pdx(dot)edu, len(at)cs(dot)pdx(dot)edu, pgsql-performance(at)postgresql(dot)org
Subject: Re: Understanding histograms
Date: 2008-04-30 23:17:44
Message-ID: 26230.1209597464@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Jeff Davis <pgsql(at)j-davis(dot)com> writes:
> On Wed, 2008-04-30 at 10:43 -0400, Tom Lane wrote:
>> Surely that's not very sane? The MCV list plus histogram generally
>> don't include every value in the table.

> My understanding of Len's question is that, although the MCV list plus
> the histogram don't include every distinct value in the general case,
> they do include every value in the specific case where the histogram is
> not full.

I don't believe that's true. It's possible that a small histogram means
that you are seeing every value that was in ANALYZE's sample, but it's
a mighty long leap from that to the assumption that there are no other
values in the table. In any case that seems more an artifact of the
implementation than a property the histogram would be guaranteed to
have.

> ... the statistics aren't guaranteed to be perfectly up-to-date, so an
> estimate of zero might be risky.

Right. As a matter of policy we never estimate less than one matching
row; and I've seriously considered pushing that up to at least two rows
except when we see that the query condition matches a unique constraint.
You can get really bad join plans from overly-small estimates.

regards, tom lane

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Gregory Stark 2008-05-01 00:53:44 Re: Understanding histograms
Previous Message Jeff Davis 2008-04-30 22:47:02 Re: Understanding histograms