We are developing a new feature for vacuum, here is a brief overview
A) What is it?
This feature enables vacuum has resumable capability. Vacuum can
remembers the point it stops, then resumes interrupted vacuum operation
from the point next time.
The SQL syntaxes for this feature has the format similar to:
# VACUUM ALL --> It has the same functionality with current vacuum;
# VACUUM --> It uses the information saved by previous
interrupted vacuum to continue vacuum processing.
B) Why do we need it?
For large table, although it can be vacuum by enabling vacuum cost-based
delay, but the processing may last for several days. It definitely has
negative affect on system performance. So if systems which has
maintenance time, it is preferred to vacuum in maintenance window.
Vacuum task can be split into small subtasks, and they can be
scheduled into maintenance window time slot. This can reduce the impact
of vacuum to system service.
But currently vacuum task can not be split: if an interrupt or error
occurs during vacuum processing, vacuum totally forgets what it has
done and terminates itself. Following vacuum on the same table has to
scan from the beginning of the heap block. This proposal enable vacuum
has capability to stop and resume.
C) How can we use it?
This feature can enable autovacuum or cron-vacuum-scheduler to develop
more sophisticated vacuum schedule schemes combined with *maintenance
For example, if the system has two hour maintenance window every day,
vacuum can be interrupted by this way:
SET statement_timeout TO 2*3600*1000; # two hours
VACUUM freeze talname;
SET statement_timeout TO DEFAULT;
Or it can be interrupted by SIGINT directly.
Autovacuum or manual vacuum scheduler can split a large vacuum task
into small subtasks by this feature. Subtasks can be scheduled according
to system load.
A) Vacuum internal overview
Concurrent vacuum mainly has the following steps to vacuum a table:
1. scan heap to collect dead tuple list
2. (if the table has indexes) scan and sweep indexes
3. sweep dead tuples collected in step 1
4. perform additional index cleanup operation
5. (if a certain of free space found) truncate table
6. register free space to FSM
7. update statistics of the table
If maintenance memory is not sufficient to contain all of dead tuples,
step 1-3 are repeated.
C) Where to stop
The first option is that it can accept stop request at boundary
of each steps, and it resumes from the unfinished step. But some
steps takes a short time, some take a long time. For example,
step 1 takes more than 40% of total time, but step 5 and 6 take
less than 1% of total time. This granularity is too larger.
The second option is to accept stop request before vacuum begin to
process one of the blocks in step 1-4. In current vacuum implementation,
vacuum_delay_point is also placed in such locations. This option has a
much good granularity than option 1.
This implementation accepts stop request at *blocks level* in step 1-4.
D) How to stop and resume
When vacuum stop in step 1-4, vacuum perform following things:
vacuum saves dead tuple list, the heap block number on which it
stop, unswept index relations, unswept dead tuple and FreezeLimit
into a disk file.
Free space information collected in step 1-3 can be registered to
FSM when vacuum is interrupted.
When vacuum is resuming, it reads out saved information and skip the
finished operations, then continue to finish remaining operations.
There are two additional issues which need to be discussed here:
There are two options to select FreezeLimit for a resuming a
vacuum:(a) FreezeLimit of interrupted vacuum, (b) FreezeLimit of
resuming vacuum. FreezeLimit-(b) is safe. But for the heap
blocks are not full scanned, so when FreezeLimit-(b) is used ,
the relfrozenxid should be updated with FreezeLimit-(a) at the
end of vacuum, and CLOG can only be truncated by FreezeLimit-(a).
*) Resuming index operation.
There are two possible resuming levels when vacuum is interrupted
in step 2:(a) skip the *index relations* which have been swept
completely (b) skip the *index blocks* which have been swept.
Level (a) is safe and simple to be implemented; level (b) need to
consider the scenarios that leaf page is split; further
investigation is needed to clarify if it is safe or not.
This implementation adopts *level (a) resuming*.
3. Implementation Plan
We are working on the patch now; I will send the WIP patch to the list
later. I am sorry this late proposal, but I hope it can go into 8.3.
Welcome your comments and ideas.
Galy Lee (lee(dot)galy(at)oss(dot)ntt(dot)co(dot)jp)
NTT Open Source Software Center
pgsql-hackers by date
|Next:||From: Markus Schiltknecht||Date: 2007-02-26 09:42:11|
|Subject: Re: SCMS question|
|Previous:||From: Michael Paesold||Date: 2007-02-26 09:10:47|
|Subject: Re: conversion efforts (Re: SCMS question)|