Re: autovacuum

From: Chris Browne <cbbrowne(at)acm(dot)org>
To: pgsql-admin(at)postgresql(dot)org
Subject: Re: autovacuum
Date: 2006-02-01 20:50:25
Message-ID: 60bqxqls6m.fsf@dba2.int.libertyrms.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin pgsql-hackers

matthew(at)zeut(dot)net ("Matthew T. O'Connor") writes:
> Hope that helps. Real world feed-back is always welcome.

While I'm at it, I should throw in an idea that I had a little while
back about a "vacuum request manager."

This is kind of orthogonal to everything else that has been happening
with pg_autovacuum...

One of the troubles we have been hitting with our homebrew scripts is
when locking doesn't turn out, and they start submitting multiple
vacuums at once, which sometimes builds up "to ill."

A thought I had was to create a daemon that would serially process
requests. It would just watch a table of requests, and when it finds
work, start work.

We'd then have some sort of "injection" process that would tell the
daemon "Here's new work!"

Requests would be defined thus:

/* cbbrowne(at)[local]/dba2 vacdb=*/ \d vacuum_requests
Table "public.vacuum_requests"
Column | Type | Modifiers
--------------+--------------------------+------------------------
vtable | text | not null
vhost | text | not null
vdatabase | text | not null
urgency | integer | not null default 1
created_on | timestamp with time zone | not null default now()
completed_on | timestamp with time zone |
failed_at | timestamp with time zone |
Indexes:
"vacuum_requests_pkey" primary key, btree (vtable, vhost, vdatabase, created_on)
"vr_priority" btree (vhost, vdatabase, urgency) WHERE ((completed_on IS NULL) AND (failed_at IS NULL))

/* cbbrowne(at)[local]/dba2 vacdb=*/ \d vacuum_start
Table "public.vacuum_start"
Column | Type | Modifiers
--------------+--------------------------+------------------------
vtable | text | not null
vhost | text | not null
vdatabase | text | not null
started_on | timestamp with time zone | not null default now()
completed_on | timestamp with time zone |
Indexes:
"vacuum_start_pkey" primary key, btree (vtable, vhost, vdatabase, started_on)

/* cbbrowne(at)[local]/dba2 vacdb=*/ \d vacuum_failures
Table "public.vacuum_failures"
Column | Type | Modifiers
------------+--------------------------+------------------------
vtable | text | not null
vhost | text | not null
vdatabase | text | not null
started_on | timestamp with time zone | not null
failed_on | timestamp with time zone | not null default now()
Indexes:
"vacuum_failures_pkey" primary key, btree (vtable, vhost, vdatabase, started_on)

This has a bit more generality than would be needed for handling just
one postmaster; host/database would allow this to be used to manage
multiple backends...

We have, in our "kludged-up scripts," three levels of granularity:

1. There are tables we vacuum every few minutes; they would be at
urgency 1; every few minutes, we would, in effect, run the query...

insert into vacuum_requests (vtable, vhost, vdatabase, urgency)
select t.fqtablename, h.hostname, tld.name, 1
from urgent_tables t, all_hosts h, all_tlds tld;

2. Then, there are "hourly" tables, at urgency level 2.

Once an hour, we run:

insert into vacuum_requests (vtable, vhost, vdatabase, urgency)
select t.fqtablename, h.hostname, tld.name, 2
from hourly_tables t, all_hosts h, all_tlds tld;

3. Once a day, we'd do something kind of like:

insert into vacuum_requests (vtable, vhost, vdatabase, urgency)
select table_schema || '.' || table_name, h.hostname, tld.name, 3
from information_schema.tables, all_hosts h, all_tlds tld
where table_type = 'BASE TABLE' and table_schema in ('public', 'pg_catalog');

The event loop for the daemon would be to look up the highest priority
table, and add an entry to vacuum_start.

Then it vacuums the table.

If that succeeds, the table is marked as complete in both
vacuum_start, and, FOR EVERY ENTRY CURRENTLY OUTSTANDING, in
vacuum_requests. Thus, if a table is queued up 20 times, it will be
vacuumed once, and marked as done 20 times.

If that fails, all the relevant entries in vacuum_start and
vacuum_requests are marked with the failure information, and a record
is added to the failures table.

We're putting this off, pending the thought that, with 8.1, it's worth
testing out pg_autovacuum again.

The above is an "in-the-database" way of queueing up requests,
associating priorities to them, and having the queue be
administrator-visible.

We were anticipating using our present quasi-kludgy scripts to add our
favorite tables to the queue; it would seem a nice/natural thing for
there to be some automatic process (ala the pg_autovacuum daemon) that
could add things to the queue based on its knowledge of updates.

My thought is that if anything about the above appears useful to
pg_autovacuum, I'd be happy if pg_autovacuum grabbed (stole? ;-)) some
of the ideas.
--
"cbbrowne","@","cbbrowne.com"
http://cbbrowne.com/info/sap.html
"The X-Files are too optimistic. The truth is *not* out there..."
-- Anthony Ord <nws(at)rollingthunder(dot)co(dot)uk>

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Jim C. Nasby 2006-02-01 21:16:33 Default autovacuum settings too conservative
Previous Message Chris Browne 2006-02-01 19:51:36 Re: autovacuum

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim C. Nasby 2006-02-01 21:21:16 Re: autovacuum
Previous Message Bruce Momjian 2006-02-01 20:14:53 Re: A note about testing EXEC_BACKEND on recent Linuxen