Skip site navigation (1) Skip section navigation (2)

Re: Auto Vacuum

From: Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Russell Smith <mr-russ(at)pws(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Auto Vacuum
Date: 2004-11-29 02:45:20
Message-ID: Pine.LNX.4.58.0411291340190.23987@linuxworld.com.au (view raw or flat)
Thread:
Lists: pgsql-hackers
Matthew T. O'Connor looked at this fairly closely leading up to 8.0
feature freeze. There was a long discussion earlier this year with respect
to libpq vs. using backend functions directly to vacuum multiple
databases.

http://archives.postgresql.org/pgsql-hackers/2004-03/msg00931.php

This should be mandatory reading before proceeding. Also, you want to talk
to Matthew and review his patch (submitted early July, I think).

Thanks,

Gavin

On Sun, 28 Nov 2004, Bruce Momjian wrote:

>
> I have added an auto-vacuum TODO item:
>
> * Auto-vacuum
>         o Move into the backend code
>         o Scan the buffer cache to find free space or use background writer
>         o Use free-space map information to guide refilling
>
>
> ---------------------------------------------------------------------------
>
> Russell Smith wrote:
> > Hi All,
> >
> > I am doing serious thinking about the implementation of Auto Vacuum as part of the backend, Not using libpq, but classing internal functions directly.
> > It appears to me that calling internal functions directly is a better implementation than using the external library to do the job.
> >
> > I know I might be stepping on Matthew's toes, but I don't really want to.  I am a complete newbie to the postgresql code, however I am trying.
> > Vacuum appears to be one of the bigger saw points with administrator having to configure it via scheduled tasks.
> >
> > I have outlined things I have thought about below.  I've surely missed a lot, and am open to feedback.  Others may like the current tuning used
> > by the external autovacuum, however to have stats that are collected at vacuum, and used for later vacuum would mean we don't need the
> > stats collector running.
> >
> > The major autovacuum issues
> >
> > 1. Transaction Wraparound
> > 2. Vacuum of relations
> > 3. Tracking of when to do vacuums
> > 4. Where to store information needed by auto vacuum
> >
> > I would suggest the first step is to replicate the wraparound and relation vacuuming from the current code first. But I could be wrong.
> > Currently there are a lot of tuning options in the external autovacuum, and the best way to run vacuum internally will need to be thought out.
> >
> > 1. Transaction Wraparound
> >
> > It appears from the code, that the best time to do a transaction wraparound db wide vacuum is when the
> > frozenXID has wrapped, TransactionIdPrecedes(myXID, dbform->datfrozenxid).  And is probably the most
> > simple vacuum to implement.
> >
> >
> > 2. Vacuuming of relations
> >
> > Currently, the entire heap must be vacuumed at one time.  I would possible be desireable to have only part of the relation vacuumed at
> > a time.  If you can find out which parts of the relation have the most slack space.  There is a todo item regarding tracking recent deletions
> > so they can be resused.  Some form of this would be helpful to work out what to vacuum.  Performance issues for this type of activity
> > may be a concern.  But I have no experience to be able to make comment on them.  So I welcome yours.
> >
> > 3. Tracking of when to vacuum
> >
> > Current autovacuum relies the stats collector to be running.  I would like to only use the stats if they are available,
> > and have an option to be able to vacuum accurately without having to have stats running.
> >
> > By adding certain histograms, on tuples, filesize and slack space we can guage the time between vacuums
> >
> > number of tuples will show the inserts as opposed to updates.
> > file size will show that the file is growing and by how much between vacuums.
> > slack space will show the delete/updated records.
> >
> > A new guc and relation option would need to be implemented to give a target slack space in a file.
> > this will help to reduce growth in relations if vacuum happens to not run frequently enough.  This information
> > can also inform autovacuum that it should be vacuuming more frequently.  The number would be a percentage,
> > eg 10% of the total file size is allowed to be unused.  Also alter table would allow users to set levels of slackness
> > for each relation.  If the table gets too much more than the target slack space, a "partial" vacuum full could be run
> > to reduce the size of the table by moving tuples at the end of the table to slack space near the beginning and
> > shortening the table length.  It would require a full table lock, but you may be able to space it out, to only do a page
> > at a time.
> >
> > /* target percentage of slack space */
> > vacuum_default_target_slack_space = 10
> >
> > ALTER TABLE SET TARGET SLACK SPACE = 10;
> >
> > 4. Where to store information required by auto vacuum.
> >
> > Auto vacuum needs somewhere to stop the information it needs about current and future vacuuming.  I am unsure of where
> > to put this.  It appears as if it will have a number of fields.  I feel like pg_class is the best place to put the information, but
> > again I have no idea.
> >
> > That's the best I can do for now.  I can clarify things further if required.
> >
> > Regards
> >
> > Russell.
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 1: subscribe and unsubscribe commands go to majordomo(at)postgresql(dot)org
> >
>
> --
>   Bruce Momjian                        |  http://candle.pha.pa.us
>   pgman(at)candle(dot)pha(dot)pa(dot)us               |  (610) 359-1001
>   +  If your life is a hard drive,     |  13 Roberts Road
>   +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo(at)postgresql(dot)org
>

In response to

pgsql-hackers by date

Next:From: Jim SeymourDate: 2004-11-29 03:13:06
Subject: Re: [HACKERS] Adding Reply-To: <listname> to Lists configuration ...
Previous:From: Bruce MomjianDate: 2004-11-29 02:29:45
Subject: Re: Auto Vacuum

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group