Re: [HACKERS] Index Puzzle for you

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Kristofer Munn <kmunn(at)munn(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [HACKERS] Index Puzzle for you
Date: 1999-12-29 10:12:55
Message-ID: 199912291012.FAA24890@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> Tom Lane wrote:
> > The thing that jumps out at me from this example is the much larger
> > estimate of returned rows in the second case. The planner is clearly
>
> Good catch! There were 296 possible issues the table. One had 86,544
> articles associated with it. The next highest was 5,949. Then the
> numbers drop to 630, 506, 412, 184 and then the rest are all under 62.
> Out of curiosity, how does vacuum decide on the large estimate?
>
> The maximum is 86,544.
> The average row return for ixissue = x is 3412.
> The median is 25.
> The mode is 25.
>
> ixissue is the result of a sequence.
>
> Thanks for the heads up on this...

Here is the relevent comment from vacuum.c. It is not perfect, but was
the best thing I could think of.

---------------------------------------------------------------------------

/*
* vc_attrstats() -- compute column statistics used by the optimzer
*
* We compute the column min, max, null and non-null counts.
* Plus we attempt to find the count of the value that occurs most
* frequently in each column. These figures are used to compute
* the selectivity of the column.
*
* We use a three-bucked cache to get the most frequent item.
* The 'guess' buckets count hits. A cache miss causes guess1
* to get the most hit 'guess' item in the most recent cycle, and
* the new item goes into guess2. Whenever the total count of hits
* of a 'guess' entry is larger than 'best', 'guess' becomes 'best'.
*
* This method works perfectly for columns with unique values, and columns
* with only two unique values, plus nulls.
*
* It becomes less perfect as the number of unique values increases and
* their distribution in the table becomes more random.
*
*/

--
Bruce Momjian | http://www.op.net/~candle
maillist(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Adriaan Joubert 1999-12-29 13:09:26 Re: [HACKERS] Index corruption
Previous Message Margarit Nickolov 1999-12-29 08:52:49 Index scan on CIDR field ?