Re: Thoughts on statistics for continuously advancing columns

From: Greg Stark <gsstark(at)mit(dot)edu>
To: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Josh Berkus <josh(at)agliodbs(dot)com>, Nathan Boley <npboley(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Thoughts on statistics for continuously advancing columns
Date: 2009-12-30 18:42:38
Message-ID: 407d949e0912301042w3e1c14b6s5247587c5fc07ba5@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Dec 30, 2009 at 4:31 PM, Joshua D. Drake <jd(at)commandprompt(dot)com> wrote:
> On the other hand ANALYZE also:
>
> 1. Uses lots of memory
> 2. Lots of processor
> 3. Can take a long time
>
> We normally don't notice because most sets won't incur a penalty. We got a
> customer who
> has a single table that is over 1TB in size... We notice. Granted that is
> the extreme
> but it would only take a quarter of that size (which is common) to start
> seeing issues.

I'm a bit puzzled by people's repeated suggestion here that large
tables take a long time to analyze. The sample analyze takes to
generate statistics is not heavily influenced by the size of the
table. Your 1TB table should take basically the same amount of time as
a 1GB table or a 1MB table (if it wasn't already in cache).

Unless the reason why it's 1TB is that the columns are extremely wide
rather than that it has a lot of rows? Or unless you've raised the
statistics target in (a misguided*) belief that larger tables require
larger statistics targets to achieve the same level of accuracy. Or
unless when you say "ANALYZE" you're really running "VACUUM ANALYZE".

[*] except for ndistinct estimates :(

--
greg

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2009-12-30 18:56:31 Re: Cancelling idle in transaction state
Previous Message Tom Lane 2009-12-30 18:24:16 Re: Thoughts on statistics for continuously advancing columns