Autovacuum improvements

From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Autovacuum improvements
Date: 2007-01-14 13:18:07
Message-ID: 20070114131807.GA22879@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

I've been thinnking how to improve autovacuum so that we can convince
more people that it can be enabled by default. Here are my thoughts.
There are two areas of improvements:

1. scheduling, and
2. process handling, i.e., how to have multiple vacuum processes running
at any time.

I ripped out the part about having multiple "vacuum queues", as it was
incomplete and it was also getting too complex. We need to discuss how
to do that, because it's a fundamental part of this proposal; the idea
is to be able to have several vacuums running at any time, but we need
to find a way to specify a policy for it.

Process Handling
================

My idea here is to morph the current autovacuum daemon from an agent
that itself runs a vacuum command, into something that launches other
processes to run those vacuum commands. I'll call this "the autovacuum
launcher process", or the launcher for short. The idea here is that the
launcher can take care of the scheduling while the worker processes do
their work. If the launcher then determines that a particular instant
there should be two vacuums running, then it simply starts two worker
processes.

The launcher would be running continuously, akin to the postmaster, but
would be obviously under control of the latter, so it's postmaster's
responsability to start and stop the launcher. The launcher would be
connected to shared memory, so it can scan system catalogs to load the
schedule (stored in system catalogs) into memory. If the launcher dies,
the postmaster should treat it like any other process' crash and cause a
restart cycle.

The workers would not be postmaster's direct children, which could be a
problem. I'm open to ideas here, but I don't like using the postmaster
directly as a launcher, because of the shmem connection, which would
take robustness away from the postmaster. One idea to solve this is to
have the launcher process communicate child process IDs to the
postmaster, so that when it (the postmaster) wants to stop, it has those
additional PIDs in its process list and can signal them to stop. The
launcher process would also signal when it detects that one of the
workers stopped, and the postmaster would remove that process from the
list. This communication could be made to happen via named pipes, and
since the messages are so simple, there's no reliability concern for the
postmaster; it's very easy to verify that a message is correct by
checking whether the process is actually killable by kill(0).

Another idea that I discarded was to have the launcher communicate back
to the postmaster when new workers should be started. My fear is that
this type of communication (a lot more complex that just sending a PID)
could be a cause for postmaster instability.

Scheduling
==========

We introduce the following concepts:

1. table groups. We'll have a system catalog for storing OID and group
name, and another catalog for membership, linking relid to group OID.

pg_av_tablegroup
tgrname name

pg_av_tgroupmembers
groupid oid
relid oid

2. interval groups. We'll have a catalog for storing igroup name and
OID, and another catalog for membership. We identify an interval by:
- month of year
- day of month
- day of week
- start time of day
- end time of day

This is modelled after crontabs.

pg_av_intervalgroup
igrname name

pg_av_igroupmembers
groupid oid
month int
dom int
dow int
starttime timetz
endtime timetz

Additionally, we'll have another catalog on which we'll store table
groups to interval groups relationships. On that catalog we'll also
store those autovacuum settings that we want to be able to override:
whether to disable it for this interval group, or the values for the
vacuum/analyze equations.

pg_av_schedule
tgroup oid
igroup oid
enabled bool
queue int
vac_base_thresh int
vac_scale_factor float
anl_base_thresh int
anl_scal_factor float
vac_cost_delay int
vac_cost_limit int
freeze_min_age int
freeze_max_age int

So the scheduler, at startup, loads the whole schedule in memory, and
then wakes up at reasonable intervals and checks whether these equations
hold for some of the tables it's monitoring. If they do, then launch a
new worker process to do the job.

We need a mechanism for having the scheduler rescan the schedule when a
user modifies the catalog -- maybe having a trigger that sends a signal
to the process is good enough (implementation detail: the signal must be
routed via the postmaster, since the backend cannot hope to know the
scheduler's PID. This is easy enough to do.)

--
Alvaro Herrera Developer, http://www.PostgreSQL.org/
"The problem with the facetime model is not just that it's demoralizing, but
that the people pretending to work interrupt the ones actually working."
(Paul Graham)

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Gurjeet Singh 2007-01-14 16:18:15 Re: [PATCHES] [HACKERS] [Fwd: Index Advisor]
Previous Message Peter Eisentraut 2007-01-14 13:11:54 pgsql: Add support for xmlval IS DOCUMENT expression.

Browse pgsql-patches by date

  From Date Subject
Next Message Gurjeet Singh 2007-01-14 16:18:15 Re: [PATCHES] [HACKERS] [Fwd: Index Advisor]
Previous Message Pavel Stehule 2007-01-14 12:57:09 scrollable cursor sup. for SPI