Skip site navigation (1) Skip section navigation (2)

Re: Strange heuristic in analyze.c

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Strange heuristic in analyze.c
Date: 2010-02-05 20:53:57
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-hackers
Greg Stark wrote:
> So I never realized the consequences of this little heuristic in
> analyze.c in the handling of very low cardinality columns where we
> want to just capture the complete list of values in the mcv and throw
> away the histogram:
> 		else if (toowide_cnt == 0 && nmultiple == ndistinct)
> 		{
> 			/*
> 			 * Every value in the sample appeared more than once.  Assume the
> 			 * column has just these values.
> 			 */
> 			stats->stadistinct = ndistinct;
> 		}
> The problem with this heuristic is that if the table is small enough
> you might expect you can set the statistics target high and "sample"
> the entire table and get a very accurate mcv covering all the values.
> However if any of the values in the table appears only once this
> heuristic will defeat you. The following code will then throw out of
> the mcv any value which isn't 25% more common than "average". Leaving
> you with a histogram for those values which often does very poorly if
> the values don't fit any pattern and are just discrete arbitrary
> values.

Do you want a C comment to document this problem?

  Bruce Momjian  <bruce(at)momjian(dot)us>

  + If your life is a hard drive, Christ can be your backup. +

In response to


pgsql-hackers by date

Next:From: Bruce MomjianDate: 2010-02-05 21:01:52
Subject: Re: Confusion over Python drivers
Previous:From: Greg SmithDate: 2010-02-05 20:51:06
Subject: Re: Confusion over Python drivers

Privacy Policy | About PostgreSQL
Copyright © 1996-2018 The PostgreSQL Global Development Group