Skip site navigation (1) Skip section navigation (2)

Re: Thoughts on statistics for continuously advancing columns

From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Josh Berkus <josh(at)agliodbs(dot)com>, Nathan Boley <npboley(at)gmail(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Thoughts on statistics for continuously advancing columns
Date: 2009-12-30 16:31:26
Message-ID: 10a85d0d4575ca892b514be18b9fe1a9@commandprompt.com (view raw or flat)
Thread:
Lists: pgsql-hackers
On Wed, 30 Dec 2009 11:16:45 -0500, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:
>> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> I don't have a better idea at the moment :-(
>  
>> It's been a while since I've been bitten by this issue -- the last
>> time was under Sybase.  The Sybase suggestion was to either add
>> "dummy rows" [YUCK!] to set the extreme bounds or to "lie to the
>> optimizer" by fudging the statistics after each generation.  Perhaps
>> we could do better by adding columns for high and low bounds to
>> pg_statistic.  These would not be set by ANALYZE, but
>> user-modifiable to cover exactly this problem?  NULL would mean
>> current behavior?
> 
> Well, the problem Josh has got is exactly that a constant high bound
> doesn't work.
> 
> What I'm wondering about is why he finds that re-running ANALYZE
> isn't an acceptable solution.  It's supposed to be a reasonably
> cheap thing to do.

What makes ANALYZE cheap is that two things:

1. It uses read only bandwidth (for the most part), which is the most
bandwidth we have
2. It doesn't take a lock that bothers anything

On the other hand ANALYZE also:

1. Uses lots of memory
2. Lots of processor
3. Can take a long time

We normally don't notice because most sets won't incur a penalty. We got a
customer who
has a single table that is over 1TB in size... We notice. Granted that is
the extreme
but it would only take a quarter of that size (which is common) to start
seeing issues.

> 
> I think the cleanest solution to this would be to make ANALYZE
> cheaper, perhaps by finding some way for it to work incrementally.
> 

That could be interesting. What about a running statistics set that has
some kind of 
threshold? What I mean is, we run our normal analyze but we can mark a
table "HOT" 
(yeah bad term). If we mark the table HOT statistics are generated on the
fly for
the planner and updated every X interval. Perhaps then written out at a
checkpoint?

This is just off the top of my head.

JD

> 			regards, tom lane

-- 
PostgreSQL - XMPP: jdrake(at)jabber(dot)postgresql(dot)org
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997

In response to

Responses

pgsql-hackers by date

Next:From: Kevin GrittnerDate: 2009-12-30 16:33:29
Subject: Re: Thoughts on statistics for continuously advancing columns
Previous:From: Magnus HaganderDate: 2009-12-30 16:24:42
Subject: Re: test/example does not support win32.

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group