Re: Understanding histograms

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: len(at)cs(dot)pdx(dot)edu
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Understanding histograms
Date: 2008-04-30 05:19:51
Message-ID: 14251.1209532791@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Len Shapiro <len(at)cs(dot)pdx(dot)edu> writes:
> 1. Why does Postgres come up with a negative n_distinct?

It's a fractional representation. Per the docs:

> stadistinct float4 The number of distinct nonnull data values in the column. A value greater than zero is the actual number of distinct values. A value less than zero is the negative of a fraction of the number of rows in the table (for example, a column in which values appear about twice on the average could be represented by stadistinct = -0.5). A zero value means the number of distinct values is unknown

> The "rows=2" estimate makes sense when const = 1 or 5, but it makes no
> sense to me for other values of const not in the MVC list.
> For example, if I run the query
> EXPLAIN SELECT * from sailors where rank = -1000;
> Postgres still gives an estimate of "row=2".

I'm not sure what estimate you'd expect instead? The code has a built in
assumption that no value not present in the MCV list can be more
frequent than the last member of the MCV list, so it's definitely not
gonna guess *more* than 2.

regards, tom lane

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Gauri Kanekar 2008-04-30 05:29:53 Re: Replication Syatem
Previous Message Len Shapiro 2008-04-30 04:56:32 Understanding histograms