Skip site navigation (1) Skip section navigation (2)

Re: Searching for the cause of a bad plan

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Csaba Nagy <nagy(at)ecircle-ag(dot)com>
Cc: postgres performance list <pgsql-performance(at)postgresql(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: Searching for the cause of a bad plan
Date: 2007-09-27 14:40:25
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-performance
Csaba Nagy <nagy(at)ecircle-ag(dot)com> writes:
> On Wed, 2007-09-26 at 11:22 -0400, Tom Lane wrote:
>> ... how
>> many values of "a" are there really, and what's the true distribution of
>> counts?  

> table_a has 23366 distinct values. Some statistics (using R):

>> summary(table_a_histogram)
>        a                 count        
>  Min.   : 70000857   Min.   :      1  
>  1st Qu.:700003628   1st Qu.:      9  
>  Median :700011044   Median :     22  
>  Mean   :622429573   Mean   :  17640  
>  3rd Qu.:700018020   3rd Qu.:    391  
>  Max.   :800003349   Max.   :3347707  

Ugh, classic long-tail distribution.  This is a really hard problem to
solve by sampling --- the sample will naturally be biased towards the
more common values, and so ANALYZE tends to conclude there are fewer
distinct values than there really are.  That means n_distinct in the
stats is too small, and that feeds directly into the misestimation of
the number of matching rows.

And yet there's another trap here: if the parameter you passed in
chanced to be one of the very common values, a plan that was optimized
for a small number of matches would perform terribly.

We've speculated about trying to deal with these types of situations
by switching plans on-the-fly at runtime, but that's just blue-sky
dreaming at the moment.  In the short run, if boosting the stats target
doesn't result in acceptable plans, there may be no real solution other
than to avoid parameterized queries on this column.

			regards, tom lane

In response to


pgsql-performance by date

Next:From: Csaba NagyDate: 2007-09-27 15:16:06
Subject: Re: Searching for the cause of a bad plan
Previous:From: Heikki LinnakangasDate: 2007-09-27 12:39:07
Subject: Re: Difference in query plan when using = or > in where clause

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group