Re: pg_autovacuum

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: "Matthew T(dot) O'Connor" <matthew(at)zeut(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_autovacuum
Date: 2004-03-23 05:32:04
Message-ID: 200403222132.04732.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Matt,

> So interesting, most uses request the per table settings, guess there
> is sufficient demand for both.

The reason for us is that multi-database installations generally have
significantly different purposes for each database; for example, with a
"reporting" database on the same server that needs no vacuuming at all.

> You might be missing the point, the advantage of using pg_autovacuum is
> that it wouldn't waste cycles doing vacuums on tables that don't need
> it. If we have persistent data (saving state information periodically)
> then this is a very easy feature to add.

OK, I can see that.

> What I'm thinking is that the VACUUM command could be modified to write
> down some data from the stats system at vacuum time. Once the VACUUM
> command writes this down for itself then pg_autovacuum just uses that
> number to make its decision. Again, we are trying to reduce as much as
> possible superfluous vacuums. If an admin vacuums his whole cluster
> every Sunday night that may prevent lots of vacuums occurring during
> business hours that effect processing.

Be nice, yes. However, my experience is that mixing manual and autovacuums
is bound to lead to endless support requests, because conflicts *will* arise.
So in some ways you'd be working to please those who can't be pleased.

> Backend integration should solve the 1st issue. Parallel vacuums is
> something that could be worked on at some point. Would it make sense
> to incorporate this with tablespaces? The vacuum daemon would only
> issue one vacuum command per tablespace, but could issue as many
> parallel vacuums as you have independent tablespaces.

Hmmm ... that's an interesting idea. I'd more been thinking about vacuums
of small tables, where a high-end server under low load could vacuum several
tables in parallel, one per CPU. However, working through tablespaces would
make a lot of sense.

> I think timeout issue would need to be a part of vacuum proper, and I'm
> not sure about the "backing up" issue.

Well, we've discussed timeout for vacuum.

Thing is, autovacuum changes the equation somewhat. Imagine that the
transaction rate of your tables accelerates so that autovacuum with a 0.3
scale setting is triggered every 23 minutes. But say that it takes 29
minutes to vacuum through all of your tables ... or even 49 minutes if you
have "slow vacuum" turned on!

You would get into a cycle where you are running vacuum continuously, all the
time. This is a very bad situation and the admin should be warned about it
via the logs.

Hmmm ... thinkiing about that, are we changing the defaults for threshold and
scale? You and I have discussed this, yes?

> The reason it's similar is that once pg_autovacuum data is persistent,
> it would be trivial to implement this feature, and the data that any
> tool would need to make these decisions is the same as what
> pg_autovacuum is already tracking.

Well, if it's easy to do, then go for it. I can see how some would find it
useful. Once it's sufficently bulletproof, it could replace the standard
VACUUM (whole db).

> I think the patch was submitted to either the hackers or patches list.
> If you can't find it, I'll look around and see if I still have a copy.
> The person who submitted said it was simple, but was working for him in
> production.

Thanks for the forward.

--
-Josh Berkus
Aglio Database Solutions
San Francisco

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2004-03-23 06:35:57 Re: bug in 7.4 SET WITHOUT OIDs
Previous Message Tom Lane 2004-03-23 04:29:44 Re: float8 regression test failure in head