Skip site navigation (1) Skip section navigation (2)

Re: Resumable vacuum proposal and design overview

From: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
To: "Jim C(dot) Nasby" <jim(at)nasby(dot)net>
Cc: "Galy Lee" <lee(dot)galy(at)oss(dot)ntt(dot)co(dot)jp>,<pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Resumable vacuum proposal and design overview
Date: 2007-02-27 16:49:44
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-hackers
On Tue, 2007-02-27 at 10:37 -0600, Jim C. Nasby wrote:
> On Tue, Feb 27, 2007 at 11:44:28AM +0900, Galy Lee wrote:
> > For example, there is one table:
> >    - The table is a hundreds GBs table.
> >    - It takes 4-8 hours to vacuum such a large table.
> >    - Enabling cost-based delay may make it last for 24 hours.
> >    - It can be vacuumed during night time for 2-4 hours.
> > 
> > It is true there is no such restrict requirement that vacuum
> > need to be interrupt immediately, but it should be stopped in an
> > *predictable way*. In the above example, if we have to wait for the end
> >  of one full cycle of cleaning, it may take up to 8 hours for vacuum to
> > stop after it has received stop request. This seems quit unacceptable.
> Even with very large tables, you could likely still fit things into a
> specific time frame by adjusting how much time is spent scanning for
> dead tuples. The idea would be to give vacuum a target run time, and it
> would monitor how much time it had remaining, taking into account how
> long it should take to scan the indexes based on how long it's been
> taking to scan the heap. When the amount of time left becomes less than
> the estimate of the amount of time required to scan the indexes (and
> clean the heap), you stop the heap scan and start scanning indexes. As
> long as the IO workload on the machine doesn't vary wildly between the
> heap scan and the rest of the vacuum process, I would expect this to
> work out fairly well.
> While not as nice as the ability to 'stop on a dime' as Tom puts it,
> this would be much easier and safer to implement. If there's still a
> need for something better after that we could revisit it at that time.

I do like this idea, but it also seems easy to calculate that bit
yourself. Run VACUUM, after X minutes issue stop_vacuum() and see how
long it takes to finish. Adjust X until you have it right.

If we did want to automate it, vacuum_target_duration userset GUC would
make, following Jim's thought. =0 means run-to-completion. 

Getting it to work well for VACUUM FULL would be more than a little
interesting though.

  Simon Riggs             

In response to


pgsql-hackers by date

Next:From: Andrew DunstanDate: 2007-02-27 16:54:32
Subject: Re: 7.x horology regression test on Solaris buildfarm machines
Previous:From: Jim C. NasbyDate: 2007-02-27 16:49:18
Subject: Re: Seeking Google SoC Mentors

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group