Skip site navigation (1) Skip section navigation (2)

Re: autovacuum

From: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To: Chris Browne <cbbrowne(at)acm(dot)org>
Cc: pgsql-admin(at)postgresql(dot)org
Subject: Re: autovacuum
Date: 2006-02-01 21:21:16
Message-ID: 20060201212116.GN95850@pervasive.com (view raw or flat)
Thread:
Lists: pgsql-adminpgsql-hackers
This seems maybe a bit overkill to me. I think what would be more useful
is if autovacuum could execute more than one vacuum at a time, and you
could specify tables that are high priority (or possibly just say that
all tables with less than X live tuples in them are high priority). That
way a longer-running vacuum on a large table wouldn't prevent more
vacuum-sensative tables (such as queues) from being vacuumed frequently
enough.

On Wed, Feb 01, 2006 at 03:50:25PM -0500, Chris Browne wrote:
> matthew(at)zeut(dot)net ("Matthew T. O'Connor") writes:
> > Hope that helps.  Real world feed-back is always welcome.
> 
> While I'm at it, I should throw in an idea that I had a little while
> back about a "vacuum request manager."
> 
> This is kind of orthogonal to everything else that has been happening
> with pg_autovacuum...
> 
> One of the troubles we have been hitting with our homebrew scripts is
> when locking doesn't turn out, and they start submitting multiple
> vacuums at once, which sometimes builds up "to ill."
> 
> A thought I had was to create a daemon that would serially process
> requests.  It would just watch a table of requests, and when it finds
> work, start work.
> 
> We'd then have some sort of "injection" process that would tell the
> daemon "Here's new work!"
> 
> Requests would be defined thus:
> 
> /* cbbrowne(at)[local]/dba2 vacdb=*/ \d vacuum_requests
>                   Table "public.vacuum_requests"
>     Column    |           Type           |       Modifiers        
> --------------+--------------------------+------------------------
>  vtable       | text                     | not null
>  vhost        | text                     | not null
>  vdatabase    | text                     | not null
>  urgency      | integer                  | not null default 1
>  created_on   | timestamp with time zone | not null default now()
>  completed_on | timestamp with time zone | 
>  failed_at    | timestamp with time zone | 
> Indexes:
>     "vacuum_requests_pkey" primary key, btree (vtable, vhost, vdatabase, created_on)
>     "vr_priority" btree (vhost, vdatabase, urgency) WHERE ((completed_on IS NULL) AND (failed_at IS NULL))
> 
> /* cbbrowne(at)[local]/dba2 vacdb=*/ \d vacuum_start
>                    Table "public.vacuum_start"
>     Column    |           Type           |       Modifiers        
> --------------+--------------------------+------------------------
>  vtable       | text                     | not null
>  vhost        | text                     | not null
>  vdatabase    | text                     | not null
>  started_on   | timestamp with time zone | not null default now()
>  completed_on | timestamp with time zone | 
> Indexes:
>     "vacuum_start_pkey" primary key, btree (vtable, vhost, vdatabase, started_on)
> 
> /* cbbrowne(at)[local]/dba2 vacdb=*/ \d vacuum_failures
>                  Table "public.vacuum_failures"
>    Column   |           Type           |       Modifiers        
> ------------+--------------------------+------------------------
>  vtable     | text                     | not null
>  vhost      | text                     | not null
>  vdatabase  | text                     | not null
>  started_on | timestamp with time zone | not null
>  failed_on  | timestamp with time zone | not null default now()
> Indexes:
>     "vacuum_failures_pkey" primary key, btree (vtable, vhost, vdatabase, started_on)
> 
> 
> This has a bit more generality than would be needed for handling just
> one postmaster; host/database would allow this to be used to manage
> multiple backends...
> 
> We have, in our "kludged-up scripts," three levels of granularity:
> 
>  1.  There are tables we vacuum every few minutes; they would be at
>      urgency 1; every few minutes, we would, in effect, run the query...
> 
>      insert into vacuum_requests (vtable, vhost, vdatabase, urgency)
>        select t.fqtablename, h.hostname, tld.name, 1
>           from urgent_tables t, all_hosts h, all_tlds tld;
> 
>  2.  Then, there are "hourly" tables, at urgency level 2.
> 
>      Once an hour, we run:
> 
>      insert into vacuum_requests (vtable, vhost, vdatabase, urgency)
>        select t.fqtablename, h.hostname, tld.name, 2
>           from hourly_tables t, all_hosts h, all_tlds tld;
> 
>  3.  Once a day, we'd do something kind of like:
> 
>      insert into vacuum_requests (vtable, vhost, vdatabase, urgency)
>        select table_schema || '.' || table_name, h.hostname, tld.name, 3 
>          from information_schema.tables, all_hosts h, all_tlds tld 
>          where table_type = 'BASE TABLE' and table_schema in ('public', 'pg_catalog');
> 
> The event loop for the daemon would be to look up the highest priority
> table, and add an entry to vacuum_start.
> 
> Then it vacuums the table.
> 
> If that succeeds, the table is marked as complete in both
> vacuum_start, and, FOR EVERY ENTRY CURRENTLY OUTSTANDING, in
> vacuum_requests.  Thus, if a table is queued up 20 times, it will be
> vacuumed once, and marked as done 20 times.
> 
> If that fails, all the relevant entries in vacuum_start and
> vacuum_requests are marked with the failure information, and a record
> is added to the failures table.
> 
> We're putting this off, pending the thought that, with 8.1, it's worth
> testing out pg_autovacuum again.
> 
> The above is an "in-the-database" way of queueing up requests,
> associating priorities to them, and having the queue be
> administrator-visible.
> 
> We were anticipating using our present quasi-kludgy scripts to add our
> favorite tables to the queue; it would seem a nice/natural thing for
> there to be some automatic process (ala the pg_autovacuum daemon) that
> could add things to the queue based on its knowledge of updates.
> 
> My thought is that if anything about the above appears useful to
> pg_autovacuum, I'd be happy if pg_autovacuum grabbed (stole? ;-)) some
> of the ideas.
> -- 
> "cbbrowne","@","cbbrowne.com"
> http://cbbrowne.com/info/sap.html
> "The X-Files are too optimistic.  The truth is *not* out there..."
> -- Anthony Ord <nws(at)rollingthunder(dot)co(dot)uk>
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
>        message can get through to the mailing list cleanly
> 

-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby(at)pervasive(dot)com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

In response to

Responses

pgsql-hackers by date

Next:From: Hannu KrosingDate: 2006-02-01 21:44:59
Subject: Question about ALTER TABLE SET TABLESPACE locing behaviour
Previous:From: Chris BrowneDate: 2006-02-01 20:50:25
Subject: Re: autovacuum

pgsql-admin by date

Next:From: Matthew T. O'ConnorDate: 2006-02-01 21:37:07
Subject: Re: [PERFORM] Default autovacuum settings too conservative
Previous:From: Jim C. NasbyDate: 2006-02-01 21:16:33
Subject: Default autovacuum settings too conservative

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group