Skip site navigation (1) Skip section navigation (2)

How exactly does Analyze work?

From: Richard Neill <rn214(at)cam(dot)ac(dot)uk>
To: PostgreSQL Performance <pgsql-performance(at)postgresql(dot)org>
Subject: How exactly does Analyze work?
Date: 2009-11-25 12:34:26
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-performance
Dear All,

Thanks very much for your help so far. My understanding of PG is getting 
a lot better!

I wonder if I've understood analyze properly: I'm not sure I quite 
understand how specific the statistics gathered actually are.

In particular, what happens in the following case:
   1. I start with have a table with 100 million rows, and column wid has
      linearly distributed values from 45-90.  (wid is indexed)

   2. I run vacuum analyze

   3. I insert about 2 million rows, all of which have the new wid of 91.

   4. I then do a select * WHERE wid = 91.

How smart is analyze? Will it actually say "well, I've never seen 91 in 
this table, because all the values only go up to 90, so you'd better do 
a sequential scan"?


On another note, I notice that if I ever manually run vacuum or analyze, 
the performance of the database drops to the point where many of the 
operators get kicked out. Is there any way to run them "nice" ?

We need to maintain a response time of under 1 second all day for simple 
queries (which usually run in about 22ms). But Vacuum or Analyze seem to 
lock up the system for a few minutes, during which other queries block 
on them, although there is still plenty of CPU spare.


Also, I find that, even with the autovacuum daemon running, there was 
one query last night that I had to terminate after an hour. In 
desperation, I restarted postgres, let it take 15 mins to vacuum the 
entire DB, and then re-ran the query (in 8 minutes)

Any ideas how I can troubleshoot this better? The database is only 30GB 
in total - it should (if my intuition is right) be impossible that any 
simple select (even over a modestly complex view) should take longer 
than a multiple of the time required to read all the data from disk?

Thanks very much,



pgsql-performance by date

Next:From: Robert HaasDate: 2009-11-25 14:40:00
Subject: Re: Best possible way to insert and get returned ids
Previous:From: Richard NeillDate: 2009-11-25 12:27:28
Subject: Re: Query times change by orders of magnitude as DB ages

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group