Re: Vaccuum best practice: cronjob or autovaccuum?

From: Bill Moran <wmoran(at)collaborativefusion(dot)com>
To: Joao Ferreira gmail <joao(dot)miguel(dot)c(dot)ferreira(at)gmail(dot)com>
Cc: PG-General Mailing List <pgsql-general(at)postgresql(dot)org>
Subject: Re: Vaccuum best practice: cronjob or autovaccuum?
Date: 2008-08-28 15:27:09
Message-ID: 20080828112709.a31eb988.wmoran@collaborativefusion.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

In response to Joao Ferreira gmail <joao(dot)miguel(dot)c(dot)ferreira(at)gmail(dot)com>:
>
> On Thu, 2008-08-28 at 19:53 +0800, Phoenix Kiula wrote:
> > On our database of about 5GB we vaccuum all of our 12 tables (only one
> > is huge, all others have about 100,000 rows or so) every hour or so.
>
> if you refer to manual VACUUM or VACUUM FULL every hour is probably too
> much. You should aim your vacuum full for about 1ce per week.

This statement is ambiguous. Advice about VACUUM is _very_ different
than advice about VACUUM FULL.

Running a normal VACUUM once an hour may be exactly what you need, but
you didn't give enough evidence one way or the other. Look into the
various system catalog tables to see how much bloat your tables and
indexes have and track that over time to see if autovac and/or your
cron vacuum is keeping things in check.

> > But we also have autovaccuum enabled. Is this okay? Do the two vaccuum
> > processes contradict each other, or add unnecessary load to the
> > system?

If you're cron jobs are vacuuming enough, then autovac will simply
take a few CPU cycles to realize that nothing needs to be done.

> you'll find that once in a while (start at once/week and build up or
> down from there) you can/should:
>
> - vacuum full
> - reindex your tables
> - reindex your indexes

Whoah there. How on earth did you derive that from those documents?
That's a pretty lousy maintenance plan with lots of unneeded overhead,
unless you're database has a very edge-case access pattern.

VACUUM FULL really only needs done in extreme cases where massive data
bloat is experienced, and not expected to happen again. If massive data
bloat is routine, you're probably better off letting ordinary VACUUM
maintain a consistent level of free space on tables.

REINDEXING seems nice at times, but I've yet to see any evidence that
it's necessary. There are probably some corner cases, but I've not seen
them documented. I tried to document what I thought was a corner case
once, and was unable to come up with anything conclusive.

I don't even know what you mean by "reindex your indexes"

> > The reason we introduced the cronjob we felt was that the autovaccuum
> > was not really doing its job.
>
> how did you realise that ? turn off the cron job, wait a few days. In
> the meanwhile monitor your disk space ocupation (du
> -sh /var/lib?/pgsql....????/base/)

I agree with most of this, although it'll probably be easier to look into
PostgreSQL system tables to watch the actual table sizes:
select relname, relpages from pg_class
where relkind='i' and relname not like 'pg_%' order by relname;

You may find that autovacuum handles everything well except for one or
two tables, in which case you can optimize your cron job to vacuum just
those tables.

> > I wonder if anyone can share some
> > insight on whether these settings are good for a DB that is basically
> > 24x7:
>
> like someone sayd: it's not the 24x7. it's the: how many tuples get
> DELETEd or UPDTATEd (for the case of autovacuum) in one day, for
> example.

The 24/7 aspect _is_ part of the equation. VACUUM FULL takes out an
exclusive lock while doing it's work, which is not appropriate for 24/7
operation. Routine VACUUM FULL is _never_ appropriate for a 24/7
operation (added to the fact that it's seldom appropriate at all).

[snip lots of information that is good and I have nothing to add to]

> In my case I have autovaccum with scale factors 0 and naptime 600; also
> a cron job for vacuum full and reindex everything once a week (during
> the night). its working fine on a db with about 2 Giga and average 10000
> deletes a day and well above 200000 INSERTs/UPDATEs per day.

You're implying that the DB has down time at night, by this comment, which
means you have different capabilities on what you can do than the original
poster, who does _not_ want to run VACUUM FULL at night if he wants to
maintain 24/7 operation.

That being said, I still doubt that VACUUM FULL is necessary. Have you
_tried_ running with normal vacuum alone? Have you benchmarked the system
to demonstrate that VACUUM FULL is really helping anything?

That being said, if you have a big outage window every week where you can
afford to run whatever you want without interfering with other users, feel
free to continue as you are. It won't cause you any problems.

The naptime at 600 is probably a bad idea. If you only have one user
database on this system, then it only gets investigated by autovac once
every 40 minutes (template0 ... template1 ... postgres ... yourdb)
Consider that autovac uses very little resources when it determines that
it has no work to do.

--
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

wmoran(at)collaborativefusion(dot)com
Phone: 412-422-3463x4023

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2008-08-28 15:43:31 Re: pg_dumpall problem when roles have default schemas
Previous Message Douglas McNaught 2008-08-28 15:21:24 Re: WAL file questions - how to relocate on Windows, how to replay after total loss, etc