Skip site navigation (1) Skip section navigation (2)

Re: [PATCHES] Better default_statistics_target

From: Decibel! <decibel(at)decibel(dot)org>
To: Christopher Browne <cbbrowne(at)gmail(dot)com>
Cc: Guillaume Smet <guillaume(dot)smet(at)gmail(dot)com>,Greg Sabino Mullane <greg(at)turnstep(dot)com>,pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCHES] Better default_statistics_target
Date: 2008-01-30 22:58:48
Message-ID: 20080130225848.GE1212@decibel.org (view raw or flat)
Thread:
Lists: pgsql-hackerspgsql-patches
On Mon, Jan 28, 2008 at 11:14:05PM +0000, Christopher Browne wrote:
> On Dec 6, 2007 6:28 PM, Decibel! <decibel(at)decibel(dot)org> wrote:
> > FWIW, I've never seen anything but a performance increase or no change
> > when going from 10 to 100. In most cases there's a noticeable
> > improvement since it's common to have over 100k rows in a table, and
> > there's just no way to capture any kind of a real picture of that with
> > only 10 buckets.
> 
> I'd be more inclined to try to do something that was at least somewhat
> data aware.
> 
> The "interesting theory" that I'd like to verify if I had a chance
> would be to run through a by-column tuning using a set of heuristics.
> My "first order approximation" would be:
> 
> - If a column defines a unique key, then we know there will be no
> clustering of values, so no need to increase the count...
> 
> - If a column contains a datestamp, then the distribution of values is
> likely to be temporal, so no need to increase the count...
> 
> - If a column has a highly constricted set of values (e.g. - boolean),
> then we might *decrease* the count.
> 
> - We might run a query that runs across the table, looking at
> frequencies of values, and if it finds a lot of repeated values, we'd
> increase the count.
> 
> That's a bit "hand-wavy," but that could lead to both increases and
> decreases in the histogram sizes.  Given that, we can expect the
> overall stat sizes to not forcibly need to grow *enormously*, because
> we can hope for there to be cases of shrinkage.

I think that before doing any of that you'd be much better off
investigating how much performance penalty there is for maxing out
default_statistict_target. If, as I suspect, it's essentially 0 on
modern hardware, then I don't think it's worth any more effort.

BTW, that investigation wouldn't just be academic either; if we could
convince ourselves that there normally wasn't any cost associated with a
high default_statistics_target, we could increase the default, which
would reduce the amount of traffic we'd see on -performance about bad
query plans.
-- 
Decibel!, aka Jim C. Nasby, Database Architect  decibel(at)decibel(dot)org 
Give your computer some brain candy! www.distributed.net Team #1828

In response to

Responses

pgsql-hackers by date

Next:From: Gregory StarkDate: 2008-01-30 23:08:45
Subject: Re: [PATCHES] Better default_statistics_target
Previous:From: Tom LaneDate: 2008-01-30 22:27:07
Subject: Re: GSSAPI doesn't play nice with non-canonical host names

pgsql-patches by date

Next:From: Gregory StarkDate: 2008-01-30 23:08:45
Subject: Re: [PATCHES] Better default_statistics_target
Previous:From: Heikki LinnakangasDate: 2008-01-30 20:55:22
Subject: Re: [PATCHES] Proposed patch: synchronized_scanning GUCvariable

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group