Re: Vaccuum best practice: cronjob or autovaccuum?

From: Bill Moran <wmoran(at)collaborativefusion(dot)com>
To: Joao Ferreira gmail <joao(dot)miguel(dot)c(dot)ferreira(at)gmail(dot)com>
Cc: PG-General Mailing List <pgsql-general(at)postgresql(dot)org>
Subject: Re: Vaccuum best practice: cronjob or autovaccuum?
Date: 2008-08-28 16:47:59
Message-ID: 20080828124759.f5b46961.wmoran@collaborativefusion.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

In response to Joao Ferreira gmail <joao(dot)miguel(dot)c(dot)ferreira(at)gmail(dot)com>:
>
> When autovacuum was introduced, I kept the weekly VACUUM FULL because it
> effectively brings disk occupation down, despite it grows back after a few
> hours. It's just re-assuring to me to make sure that at least one of the
> vacuums it's working when I see the weekly drop of disk occupation.

Note the "it grows back after a few hours" is unneeded load on your
system. You just spent a lot of IO shrinking a file that PG is now going
to spend a lot of IO re-enlarging. Thus you made PG work harder for a
while after the VACUUM FULL in order to get the table back to a stable
size.

> quoting:
>
> http://www.postgresql.org/docs/8.3/static/routine-reindex.html
>
> ---------------------------------------------
> The potential for bloat in non-B-tree indexes has not been well
> characterized.
> ----------------------------------------------------
>
> -------------------------------------------------
> It is a good idea to keep an eye on the index's physical size when using
> any non-B-tree index type.
> ----------------------------------------------------

Both special cases. B-tree's are default.

> ----------------------------------------------------
> Also, for B-tree indexes a freshly-constructed index is somewhat faster
> to access than one that has been updated many times, because logically
> adjacent pages are usually also physically adjacent in a newly built
> index. (This consideration does not currently apply to non-B-tree
> indexes.) It might be worthwhile to reindex periodically just to improve
> access speed.
> ----------------------------------------------------------
>
> I did some tests (lets say about 50 INSERT/UPDATES per second and somw
> thousande DELETEs once in a while).
>
> Query execution time DROPs 10 times after REINDEXING and VACUUM

This would be interesting to this conversation if you _only_ did the
REINDEX. The addition of the VACUUM makes it ambiguous as to which
command actually made the improvement.

> I've seen 600Megas DROP in disk size ocupation just by reindexing;

Is that consistent, or was it an extraordinary case? It's pretty amazing
to imagine 600M of index bloat on a 100M database ... how many indexes do
you have? Are you sure you don't have duplicate indexes or some other
issue?

> additionally about 500 Megas drop by VACCUMING; this was on a 100Megas
> of usefull data.

It's probably because you're only vacuuming once a week.

> > That's a pretty lousy maintenance plan with lots of unneeded overhead,
> > unless you're database has a very edge-case access pattern.
>
> !? Many people mention these 'edeg-case access pattern', and 'corner
> cases'. I don't now if mine is such. My appliucation executes patterns
> of varying INSERTING and UPDATING but it must overcome stress tests with
> about 100 UPDATES mixed with INSERTS per second. Additionaly it DELETEs
> up to 10000 records once every night.
>
> It's a 'near-realtime' log store which stores log information for 'you
> name it' logs in Linux based systems. It also removes old data once a
> night.

That is a fairly edge-case use that's actually documented in the docs you
reference to be likely to cause abnormal amounts of bloat. However, it's
just one index on one table (unless you have a bunch of indexes on that
table that are all continually increasing?) Unless there's something I'm
missing, it hardly justifies reindexing the entire database.

> > VACUUM FULL really only needs done in extreme cases where massive data
> > bloat is experienced, and not expected to happen again. If massive data
> > bloat is routine, you're probably better off letting ordinary VACUUM
> > maintain a consistent level of free space on tables.
>
> well. I've seen massive data and index bloating in my application. In
> some cases REINDEXING and VACUUM FULL ing was the only way to clear up
> things.

It's possible that the stable size for you tables has more free space than
actual data. While that seems crazy, it's really no different than having
a file server with 4x as much disk space as people normally are using.

> > REINDEXING seems nice at times, but I've yet to see any evidence that
> > it's necessary. There are probably some corner cases, but I've not seen
> > them documented.
>
> neither did I. REINDEXING helps me keep query execution time low.
> Othewise it will increase to levels I cannot have. We use 'modest'
> processors and 'modest' storage, in highly dependable embedded systems.

Oh ... well this might change a lot. Depending on what you mean by
"embedded", you may be using experience that's _very_ specialized to
make generalizations from.

The RAM/CPU/storage/etc constraints on anything I would call "embedded"
are different than just about anyone else would be dealing with for
tuning a DB server. If you're looking to tune PG to run well on an
embedded platform, my advice would be to throw out everything you've
heard and test extensively for yourself -- which sounds like what
you've done.

--
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

wmoran(at)collaborativefusion(dot)com
Phone: 412-422-3463x4023

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Shane Ambler 2008-08-28 16:57:35 Re: Restoring a database from a file system snapshot
Previous Message John T. Dow 2008-08-28 16:35:50 Re: WAL file questions - how to relocate on Windows, how to replay after total loss, etc