Skip site navigation (1) Skip section navigation (2)

Re: Vaccuum best practice: cronjob or autovaccuum?

From: Bill Moran <wmoran(at)collaborativefusion(dot)com>
To: Joao Ferreira gmail <joao(dot)miguel(dot)c(dot)ferreira(at)gmail(dot)com>
Cc: PG-General Mailing List <pgsql-general(at)postgresql(dot)org>
Subject: Re: Vaccuum best practice: cronjob or autovaccuum?
Date: 2008-08-28 16:47:59
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-general
In response to Joao Ferreira gmail <joao(dot)miguel(dot)c(dot)ferreira(at)gmail(dot)com>:
> When autovacuum was introduced, I kept the weekly VACUUM FULL because it
> effectively brings disk occupation down, despite it grows back after a few
> hours. It's just re-assuring to me to make sure that at least one of the
> vacuums it's working when I see the weekly drop of disk occupation.

Note the "it grows back after a few hours" is unneeded load on your
system.  You just spent a lot of IO shrinking a file that PG is now going
to spend a lot of IO re-enlarging.  Thus you made PG work harder for a
while after the VACUUM FULL in order to get the table back to a stable

> quoting:
> ---------------------------------------------
> The potential for bloat in non-B-tree indexes has not been well
> characterized.
> ----------------------------------------------------
> -------------------------------------------------
> It is a good idea to keep an eye on the index's physical size when using
> any non-B-tree index type. 
> ----------------------------------------------------

Both special cases.  B-tree's are default.

> ----------------------------------------------------
> Also, for B-tree indexes a freshly-constructed index is somewhat faster
> to access than one that has been updated many times, because logically
> adjacent pages are usually also physically adjacent in a newly built
> index. (This consideration does not currently apply to non-B-tree
> indexes.) It might be worthwhile to reindex periodically just to improve
> access speed. 
> ----------------------------------------------------------
> I did some tests (lets say about 50 INSERT/UPDATES per second and somw
> thousande DELETEs once in a while).
> Query execution time DROPs 10 times after REINDEXING and VACUUM

This would be interesting to this conversation if you _only_ did the
REINDEX.  The addition of the VACUUM makes it ambiguous as to which
command actually made the improvement.

> I've seen 600Megas DROP in disk size ocupation just by reindexing;

Is that consistent, or was it an extraordinary case?  It's pretty amazing
to imagine 600M of index bloat on a 100M database ... how many indexes do
you have?  Are you sure you don't have duplicate indexes or some other

> additionally about 500 Megas drop by VACCUMING; this was on a 100Megas
> of usefull data.

It's probably because you're only vacuuming once a week.

> > That's a pretty lousy maintenance plan with lots of unneeded overhead,
> > unless you're database has a very edge-case access pattern.
> !? Many people mention these 'edeg-case access pattern', and 'corner
> cases'. I don't now if mine is such. My appliucation executes patterns
> of varying INSERTING and UPDATING but it must overcome stress tests with
> about 100 UPDATES mixed with INSERTS per second. Additionaly it DELETEs
> up to 10000 records once every night.
> It's a 'near-realtime' log store which stores log information for 'you
> name it' logs in Linux based systems. It also removes old data once a
> night.

That is a fairly edge-case use that's actually documented in the docs you
reference to be likely to cause abnormal amounts of bloat.  However, it's
just one index on one table (unless you have a bunch of indexes on that
table that are all continually increasing?)  Unless there's something I'm
missing, it hardly justifies reindexing the entire database.

> > VACUUM FULL really only needs done in extreme cases where massive data
> > bloat is experienced, and not expected to happen again.  If massive data
> > bloat is routine, you're probably better off letting ordinary VACUUM
> > maintain a consistent level of free space on tables.
> well. I've seen massive data and index bloating in my application. In
> some cases REINDEXING and VACUUM FULL ing was the only way to clear up
> things.

It's possible that the stable size for you tables has more free space than
actual data.  While that seems crazy, it's really no different than having
a file server with 4x as much disk space as people normally are using.

> > REINDEXING seems nice at times, but I've yet to see any evidence that
> > it's necessary.  There are probably some corner cases, but I've not seen
> > them documented. 
> neither did I. REINDEXING helps me keep query execution time low.
> Othewise it will increase to levels I cannot have. We use 'modest'
> processors and 'modest' storage, in highly dependable embedded systems.

Oh ... well this might change a lot.  Depending on what you mean by
"embedded", you may be using experience that's _very_ specialized to
make generalizations from.

The RAM/CPU/storage/etc constraints on anything I would call "embedded"
are different than just about anyone else would be dealing with for
tuning a DB server.  If you're looking to tune PG to run well on an
embedded platform, my advice would be to throw out everything you've
heard and test extensively for yourself -- which sounds like what
you've done.

Bill Moran
Collaborative Fusion Inc.

Phone: 412-422-3463x4023

In response to

pgsql-general by date

Next:From: Shane AmblerDate: 2008-08-28 16:57:35
Subject: Re: Restoring a database from a file system snapshot
Previous:From: John T. DowDate: 2008-08-28 16:35:50
Subject: Re: WAL file questions - how to relocate on Windows, how to replay after total loss, etc

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group