Re: VACUUM and ANALYZE Follow-Up

From: "Mark Dexter" <MDEXTER(at)dexterchaney(dot)com>
To: <pgsql-general(at)postgresql(dot)org>
Cc: "Mark Dexter" <MDEXTER(at)dexterchaney(dot)com>
Subject: Re: VACUUM and ANALYZE Follow-Up
Date: 2004-11-29 22:57:28
Message-ID: 5E8F9F5B63726C48836757FE673B584E0121599A@dcimail.dexterchaney.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hmm... it seems that we're maybe not understanding one another here.
I'm going to try to be more clear. Below are the questions that were
raised and my attemp to answer them clearly.

1. Why run VACUUM on an empty table?

This is a good question, but perhaps there is a valid answer. Our
production database contains about 1500 tables. It is MUCH more
convenient to be able to VACCUM ANALYZE the entire database than to list
specific tables. Furthermore, our application uses "work" tables that
often will be empty (e.g., at night) but that, at times, will contain a
large number of rows. The Postgres documentation says the following:
"We recommend that active production databases be vacuumed frequently
(at least nightly), in order to remove expired rows.". This is going to
be difficult without messing up the performance.

2. Hang on, it's an empty table. Is it supposed to fabricate these
statistics out of thin air? Any made up numbers will probably be worse
than none at all.

Well, that's why I suggested some type of command line option so the
user could give it a number of rows to use for the analysis (e.g.,
ANALYZE MINIMUM 1000).

Another point of interest: If I DROP and the CREATE the table, without
doing ANALYZE, I get good performance when inserting rows. So whatever
assumptions the database is making about a newly-created table appear to
be different (and BETTER) than the assumptions made when doing ANALYZE
on an empty table. It's not clear to me why this should be. In both
cases, you don't really know anything about the table other than at this
moment it has zero rows. Obviously, it would be better (at least in
this instance) if running ANALYZE on an empty table had the same
performance result as using CREATE to make a new (empty) table.

Finally, my testing would seem to contradict that any made-up number
will be better than none at all. In my testing (inserting 35,000 rows
into an empty table), I could only measure two distinct outcomes -- one
good and one bad (with a 15X performance difference). I got good
performance with any of the folloiwng scenarios: CREATE TABLE, ANALYZE
or VACUUM with more than 94 rows in the table. I got bad performance if
I did ANALYZE or VACUUM with less than 94 rows in the table. I could
not measure any difference between other numbers of rows (between 0 and
35,000). So I don't think in practice it is that sensitive, at least in
the simple test case I was doing.

Below are two additional questions I have.

3. Is there some benefit to having ANALYZE behave the way it now does on
empty or nearly empty tables? Is there a large performance improvement
for really small tables (e.g., under 100 rows or under 1000 rows)? Does
anyone really care about performance for small tables?

4. Isn't ANALYZE on a totally empty table really a special case? The
presumption should be that the table will not remain empty. To optimize
the performance assuming that there will be zero (or close to zero) rows
seems somewhat pointless. However, there are valid reasons why a table
might be empty at the moment in time when the ANALYZE is run. (In our
case, we use "work" tables that get cleared at the end of an application
process.) And, as mentioned above, it is easier to VACUUM ANALYZE an
entire database than it is to list tables individually.

5. Why does DROP / CREATE work better than TRUNCATE / VACUUM in terms of
creating a fresh table in which to insert new rows? Is this desirable?
In both cases, the optimizer doesn't really know anything about what to
expect for the table. But CREATE provides a better starting point for
inserts than does VACUUM or ANALYZE, at least in my testing.

I am relatively new to Postgres, and I apologize if I'm repeating issues
that have been raised before. However, it does seem to me to be an
important issue. The easier it is to maintain a Postgres database, the
more likely it is to be widely used. Thanks. Mark

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Alvaro Herrera 2004-11-29 23:09:25 Re: VACUUM and ANALYZE Follow-Up
Previous Message Karsten Hilbert 2004-11-29 22:21:30 Re: VACUUM and ANALYZE Follow-Up