Re: Resumable vacuum proposal and design overview

From: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
To: "Jim C(dot) Nasby" <jim(at)nasby(dot)net>
Cc: "Galy Lee" <lee(dot)galy(at)oss(dot)ntt(dot)co(dot)jp>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Resumable vacuum proposal and design overview
Date: 2007-02-27 16:49:44
Message-ID: 1172594984.3760.765.camel@silverbirch.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 2007-02-27 at 10:37 -0600, Jim C. Nasby wrote:
> On Tue, Feb 27, 2007 at 11:44:28AM +0900, Galy Lee wrote:
> > For example, there is one table:
> > - The table is a hundreds GBs table.
> > - It takes 4-8 hours to vacuum such a large table.
> > - Enabling cost-based delay may make it last for 24 hours.
> > - It can be vacuumed during night time for 2-4 hours.
> >
> > It is true there is no such restrict requirement that vacuum
> > need to be interrupt immediately, but it should be stopped in an
> > *predictable way*. In the above example, if we have to wait for the end
> > of one full cycle of cleaning, it may take up to 8 hours for vacuum to
> > stop after it has received stop request. This seems quit unacceptable.
>
> Even with very large tables, you could likely still fit things into a
> specific time frame by adjusting how much time is spent scanning for
> dead tuples. The idea would be to give vacuum a target run time, and it
> would monitor how much time it had remaining, taking into account how
> long it should take to scan the indexes based on how long it's been
> taking to scan the heap. When the amount of time left becomes less than
> the estimate of the amount of time required to scan the indexes (and
> clean the heap), you stop the heap scan and start scanning indexes. As
> long as the IO workload on the machine doesn't vary wildly between the
> heap scan and the rest of the vacuum process, I would expect this to
> work out fairly well.
>
> While not as nice as the ability to 'stop on a dime' as Tom puts it,
> this would be much easier and safer to implement. If there's still a
> need for something better after that we could revisit it at that time.

I do like this idea, but it also seems easy to calculate that bit
yourself. Run VACUUM, after X minutes issue stop_vacuum() and see how
long it takes to finish. Adjust X until you have it right.

If we did want to automate it, vacuum_target_duration userset GUC would
make, following Jim's thought. =0 means run-to-completion.

Getting it to work well for VACUUM FULL would be more than a little
interesting though.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2007-02-27 16:54:32 Re: 7.x horology regression test on Solaris buildfarm machines
Previous Message Jim C. Nasby 2007-02-27 16:49:18 Re: Seeking Google SoC Mentors