Re: Autovacuum improvements

From: "Matthew T(dot) O'Connor" <matthew(at)zeut(dot)net>
To: Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Autovacuum improvements
Date: 2007-01-14 19:39:15
Message-ID: 45AA86E3.5090803@zeut.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

First, thanks for working on this. I hope to be helpful with the design
discussion and possibly some coding if I can find the time.

My initial reaction to this proposal is that it seems overly complex,
however I don't see a more elegant solution. I'm a bit concerned that
most users won't figure out all the knobs.

Alvaro Herrera wrote:
> I've been thinking how to improve autovacuum so that we can convince
> more people that it can be enabled by default.

I would like to see it enabled by default too, however the reason it
isn't already enabled by default is that it caused failures in the
regression test when we tried to turn it on during the 8.2 dev cycle and
it was too close to beta to fix everything. All this new machinery is
great, but it doesn't address that problem.

> Here are my thoughts.
> There are two areas of improvements:
>
> 1. scheduling, and
> 2. process handling, i.e., how to have multiple vacuum processes running
> at any time.

Fail enough, but I would say the two biggest area for improvement are
scheduling and preventing "HOT" tables from becoming vacuum starved
(essentially what you said, but with a different emphasis).

[snip]

> Process Handling
> ================
>
> My idea here is to morph the current autovacuum daemon from an agent
> that itself runs a vacuum command, into something that launches other
> processes to run those vacuum commands. I'll call this "the autovacuum
> launcher process", or the launcher for short. The idea here is that the
> launcher can take care of the scheduling while the worker processes do
> their work. If the launcher then determines that a particular instant
> there should be two vacuums running, then it simply starts two worker
> processes.

How about calling it the autovacuum_master process?

[snip autovacuum launcher process description]

That all sounds reasonable to me. I think the harder part is what you
are getting at below (how to get the launcher to figure out what to
vacuum when).

> Scheduling
> ==========
> We introduce the following concepts:
>
> 1. table groups. We'll have a system catalog for storing OID and group
> name, and another catalog for membership, linking relid to group OID.
>
> pg_av_tablegroup
> tgrname name
>
> pg_av_tgroupmembers
> groupid oid
> relid oid
>
> 2. interval groups. We'll have a catalog for storing igroup name and
> OID, and another catalog for membership. We identify an interval by:
> - month of year
> - day of month
> - day of week
> - start time of day
> - end time of day
>
> This is modelled after crontabs.
>
> pg_av_intervalgroup
> igrname name
>
> pg_av_igroupmembers
> groupid oid
> month int
> dom int
> dow int
> starttime timetz
> endtime timetz

This seems to assume that the start and end time for an interval will be
on the same day, you probably need to specify a start month, dom, dow,
time and an end month, dom, dow and time.

Since this is modeled after cron, do we allow wild-cards, or any of the
other cron tricks like */20 or 1-3,5,7,9-11?

Also your notation above is ambiguous, it took me a while to realize
that pg_av_igroupmembers.groupid wasn't referencing the id from
pg_av_tablegroup.

> Additionally, we'll have another catalog on which we'll store table
> groups to interval groups relationships. On that catalog we'll also
> store those autovacuum settings that we want to be able to override:
> whether to disable it for this interval group, or the values for the
> vacuum/analyze equations.
>
> pg_av_schedule
> tgroup oid
> igroup oid
> enabled bool
> queue int
> vac_base_thresh int
> vac_scale_factor float
> anl_base_thresh int
> anl_scal_factor float
> vac_cost_delay int
> vac_cost_limit int
> freeze_min_age int
> freeze_max_age int
>

What is queue for?

> So the scheduler, at startup, loads the whole schedule in memory, and
> then wakes up at reasonable intervals and checks whether these equations
> hold for some of the tables it's monitoring. If they do, then launch a
> new worker process to do the job.
>
> We need a mechanism for having the scheduler rescan the schedule when a
> user modifies the catalog -- maybe having a trigger that sends a signal
> to the process is good enough (implementation detail: the signal must be
> routed via the postmaster, since the backend cannot hope to know the
> scheduler's PID. This is easy enough to do.)

This all looks reasonable if not a bit complex. Question, what happens
to the current pg_autovacuum relation?

Also what about system defaults, will we have a hard coded default
interval of always on, and one default table group that contains all the
tables with one default entry in pg_av_schedule?

I think we need more discussion on scheduling, we need to make sure this
solves the vacuum starvation problem. Does the launcher process
consider each row in pg_av_schedule that applies at the current time
separately? That is say there are three entries in pg_av_schedule that
apply right now, does that mean that the launcher can fire off three
different vacuums? Perhaps we need to add a column to pg_av_tablegroup
that specifies the max number of concurrent worker processes for this
table group.

Also, I don't think we need the concept of queues as described in recent
threads. I think the idea of the queues was the the system would be
able to automatically find small tables and vacuum them frequently, in
this proposal the admin would have to create a group for small tables
and manually add tables to the group and make sure that there are enough
worker processes for that group to prevent vacuum starvation. Perhaps
we can create a dynamic group that includes all tables with less than a
certain number of rows or blocks?

Thanks for working on this!

Matt O'Connor

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2007-01-14 19:43:09 Re: Autovacuum improvements
Previous Message Tom Lane 2007-01-14 19:38:33 Re: [COMMITTERS] pgsql: Add support for xmlval IS DOCUMENT expression.

Browse pgsql-patches by date

  From Date Subject
Next Message Tom Lane 2007-01-14 19:43:09 Re: Autovacuum improvements
Previous Message Joshua D. Drake 2007-01-14 16:45:24 Re: Autovacuum improvements