Skip site navigation (1) Skip section navigation (2)

Strange heuristic in analyze.c

From: Greg Stark <stark(at)mit(dot)edu>
To: "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>
Subject: Strange heuristic in analyze.c
Date: 2010-01-29 16:38:10
Message-ID: 407d949e1001290838t247fd8e6kac432453ed1d910a@mail.gmail.com (view raw or flat)
Thread:
Lists: pgsql-hackers
So I never realized the consequences of this little heuristic in
analyze.c in the handling of very low cardinality columns where we
want to just capture the complete list of values in the mcv and throw
away the histogram:

		else if (toowide_cnt == 0 && nmultiple == ndistinct)
		{
			/*
			 * Every value in the sample appeared more than once.  Assume the
			 * column has just these values.
			 */
			stats->stadistinct = ndistinct;
		}

The problem with this heuristic is that if the table is small enough
you might expect you can set the statistics target high and "sample"
the entire table and get a very accurate mcv covering all the values.
However if any of the values in the table appears only once this
heuristic will defeat you. The following code will then throw out of
the mcv any value which isn't 25% more common than "average". Leaving
you with a histogram for those values which often does very poorly if
the values don't fit any pattern and are just discrete arbitrary
values.


-- 
greg

Responses

pgsql-hackers by date

Next:From: Simon RiggsDate: 2010-01-29 16:42:03
Subject: Re: Hot Standby: Relation-specific deferred conflict resolution
Previous:From: Joshua D. DrakeDate: 2010-01-29 16:32:58
Subject: Re: Hot Standby: Relation-specific deferred conflict resolution

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group