Re: Vaccuum best practice: cronjob or autovaccuum?

From: Joao Ferreira gmail <joao(dot)miguel(dot)c(dot)ferreira(at)gmail(dot)com>
To: PG-General Mailing List <pgsql-general(at)postgresql(dot)org>
Subject: Re: Vaccuum best practice: cronjob or autovaccuum?
Date: 2008-08-28 16:15:06
Message-ID: 1219940106.9846.69.camel@jmf-ubuntu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello,

I'dd like to apologise about my short knowledge of VACUUM FULL and
REINDEX.

I'm just stating what I do in my case. I don not know if it is a corner
case or not.

I've been dealing with this specific application which is very demanding
for Postgres for about 2 years.

When autovacuum was introduced, I kept the weekly VACUUM FULL because it
efectively brings disk ocupatio down, dispite it grows back after a few
hours. It's just re-assuring to me to make sure that at least one of the
vacuums it's working when I see the weekly drop of disk ocupation.

On Thu, 2008-08-28 at 11:27 -0400, Bill Moran wrote:
> In response to Joao Ferreira gmail <joao(dot)miguel(dot)c(dot)ferreira(at)gmail(dot)com>:
> >
> > On Thu, 2008-08-28 at 19:53 +0800, Phoenix Kiula wrote:
> > > On our database of about 5GB we vaccuum all of our 12 tables (only one
> > > is huge, all others have about 100,000 rows or so) every hour or so.
> >
> > if you refer to manual VACUUM or VACUUM FULL every hour is probably too
> > much. You should aim your vacuum full for about 1ce per week.
>
> This statement is ambiguous. Advice about VACUUM is _very_ different
> than advice about VACUUM FULL.
>
> Running a normal VACUUM once an hour may be exactly what you need, but
> you didn't give enough evidence one way or the other. Look into the
> various system catalog tables to see how much bloat your tables and
> indexes have and track that over time to see if autovac and/or your
> cron vacuum is keeping things in check.
>
> > > But we also have autovaccuum enabled. Is this okay? Do the two vaccuum
> > > processes contradict each other, or add unnecessary load to the
> > > system?
>
> If you're cron jobs are vacuuming enough, then autovac will simply
> take a few CPU cycles to realize that nothing needs to be done.
>
> > you'll find that once in a while (start at once/week and build up or
> > down from there) you can/should:
> >
> > - vacuum full
> > - reindex your tables
> > - reindex your indexes
> Whoah there. How on earth did you derive that from those documents?

I did not say I did. I just advised the original poster to read the
docs. I'm not trying to say 'this is the way'. I'm trynig to say 'read
the docs; the docs are good'.

but, BTW,

quoting:

http://www.postgresql.org/docs/8.3/static/routine-reindex.html

---------------------------------------------
The potential for bloat in non-B-tree indexes has not been well
characterized.
----------------------------------------------------

-------------------------------------------------
It is a good idea to keep an eye on the index's physical size when using
any non-B-tree index type.
----------------------------------------------------

----------------------------------------------------
Also, for B-tree indexes a freshly-constructed index is somewhat faster
to access than one that has been updated many times, because logically
adjacent pages are usually also physically adjacent in a newly built
index. (This consideration does not currently apply to non-B-tree
indexes.) It might be worthwhile to reindex periodically just to improve
access speed.
----------------------------------------------------------

I did some tests (lets say about 50 INSERT/UPDATES per second and somw
thousande DELETEs once in a while).

Query execution time DROPs 10 times after REINDEXING and VACUUM

I've seen 600Megas DROP in disk size ocupation just by reindexing;
additionally about 500 Megas drop by VACCUMING; this was on a 100Megas
of usefull data.

> That's a pretty lousy maintenance plan with lots of unneeded overhead,
> unless you're database has a very edge-case access pattern.

!? Many people mention these 'edeg-case access pattern', and 'corner
cases'. I don't now if mine is such. My appliucation executes patterns
of varying INSERTING and UPDATING but it must overcome stress tests with
about 100 UPDATES mixed with INSERTS per second. Additionaly it DELETEs
up to 10000 records once every night.

It's a 'near-realtime' log store which stores log information for 'you
name it' logs in Linux based systems. It also removes old data once a
night.
>
> VACUUM FULL really only needs done in extreme cases where massive data
> bloat is experienced, and not expected to happen again. If massive data
> bloat is routine, you're probably better off letting ordinary VACUUM
> maintain a consistent level of free space on tables.

well. I've seen massive data and index bloating in my application. In
some cases REINDEXING and VACUUM FULL ing was the only way to clear up
things.

but you're probably right. I'll give it a try. thx

>
> REINDEXING seems nice at times, but I've yet to see any evidence that
> it's necessary. There are probably some corner cases, but I've not seen
> them documented.

neither did I. REINDEXING helps me keep query execution time low.
Othewise it will increase to levels I cannot have. We use 'modest'
processors and 'modest' storage, in hioghly dependable embedded systems.

> I tried to document what I thought was a corner case
> once, and was unable to come up with anything conclusive.
>
> I don't even know what you mean by "reindex your indexes"

I mean this:

http://www.postgresql.org/docs/8.3/static/sql-reindex.html

this operation, on my table and my indexes, is helping me effectiuvely
keep query execution times down.

>
> > > The reason we introduced the cronjob we felt was that the autovaccuum
> > > was not really doing its job.
> >
> > how did you realise that ? turn off the cron job, wait a few days. In
> > the meanwhile monitor your disk space ocupation (du
> > -sh /var/lib?/pgsql....????/base/)
>
> I agree with most of this, although it'll probably be easier to look into
> PostgreSQL system tables to watch the actual table sizes:
> select relname, relpages from pg_class
> where relkind='i' and relname not like 'pg_%' order by relname;
>
> You may find that autovacuum handles everything well except for one or
> two tables, in which case you can optimize your cron job to vacuum just
> those tables.
>
> > > I wonder if anyone can share some
> > > insight on whether these settings are good for a DB that is basically
> > > 24x7:
> >
> > like someone sayd: it's not the 24x7. it's the: how many tuples get
> > DELETEd or UPDTATEd (for the case of autovacuum) in one day, for
> > example.
>
> The 24/7 aspect _is_ part of the equation.

yes. I agree. sorry for the confusion. I did not mean that 24x7 was not
important. I meant that it is not releveant to the efectiveness of any
of the vacuum types.

And yes I agree that VACUM FULL uses locking which will bring the system
unusable for a certain amount of time.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Mason Hale 2008-08-28 16:23:28 Poor planner estimation with partitioned tables
Previous Message Masis, Alexander (US SSA) 2008-08-28 16:14:59 MySQL LAST_INSERT_ID() to Postgres