How exactly does Analyze work?

From: Richard Neill <rn214(at)cam(dot)ac(dot)uk>
To: PostgreSQL Performance <pgsql-performance(at)postgresql(dot)org>
Subject: How exactly does Analyze work?
Date: 2009-11-25 12:34:26
Message-ID: 4B0D2452.2080606@cam.ac.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Dear All,

Thanks very much for your help so far. My understanding of PG is getting
a lot better!

I wonder if I've understood analyze properly: I'm not sure I quite
understand how specific the statistics gathered actually are.

In particular, what happens in the following case:
1. I start with have a table with 100 million rows, and column wid has
linearly distributed values from 45-90. (wid is indexed)

2. I run vacuum analyze

3. I insert about 2 million rows, all of which have the new wid of 91.

4. I then do a select * WHERE wid = 91.

How smart is analyze? Will it actually say "well, I've never seen 91 in
this table, because all the values only go up to 90, so you'd better do
a sequential scan"?

-----

On another note, I notice that if I ever manually run vacuum or analyze,
the performance of the database drops to the point where many of the
operators get kicked out. Is there any way to run them "nice" ?

We need to maintain a response time of under 1 second all day for simple
queries (which usually run in about 22ms). But Vacuum or Analyze seem to
lock up the system for a few minutes, during which other queries block
on them, although there is still plenty of CPU spare.

-----

Also, I find that, even with the autovacuum daemon running, there was
one query last night that I had to terminate after an hour. In
desperation, I restarted postgres, let it take 15 mins to vacuum the
entire DB, and then re-ran the query (in 8 minutes)

Any ideas how I can troubleshoot this better? The database is only 30GB
in total - it should (if my intuition is right) be impossible that any
simple select (even over a modestly complex view) should take longer
than a multiple of the time required to read all the data from disk?

Thanks very much,

Richard

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Robert Haas 2009-11-25 14:40:00 Re: Best possible way to insert and get returned ids
Previous Message Richard Neill 2009-11-25 12:27:28 Re: Query times change by orders of magnitude as DB ages