Re: autovacuum next steps, take 2

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Jim C(dot) Nasby" <jim(at)nasby(dot)net>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, "Matthew T(dot) O'Connor" <matthew(at)zeut(dot)net>, Hackers <pgsql-hackers(at)postgresql(dot)org>, Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>, Gregory Stark <stark(at)enterprisedb(dot)com>
Subject: Re: autovacuum next steps, take 2
Date: 2007-02-27 03:02:47
Message-ID: 9943.1172545367@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"Jim C. Nasby" <jim(at)nasby(dot)net> writes:
> On Mon, Feb 26, 2007 at 09:22:42PM -0500, Tom Lane wrote:
>> I'm not liking any of these very much, as they seem critically dependent
>> on impossible-to-tune parameters. I think it'd be better to design this
>> around having the first worker explicitly expose its state (list of
>> tables to process, in order) and having subsequent workers key off that
>> info.

> The real problem is trying to set that up in such a fashion that keeps
> hot tables frequently vacuumed;

Certainly, but it's not clear where that behavior emerges from Alvaro's
or Matthew's proposals, either.

Are we assuming that no single worker instance will vacuum a given table
more than once? (That's not a necessary assumption, certainly, but
without it there are so many degrees of freedom that I'm not sure how
it should act.) Given that assumption, the maximum vacuuming rate for
any table is once per autovacuum_naptime, and most of the magic lies in
the launcher's algorithm for deciding which databases to launch workers
into.

I'm inclined to propose an even simpler algorithm in which every worker
acts alike; its behavior is
1. On startup, generate a to-do list of tables to process, sorted in
priority order.
2. For each table in the list, if the table is still around and has not
been vacuumed by someone else since you started (including the case of
a vacuum-in-progress), then vacuum it.

Detecting "already vacuumed since you started" is a bit tricky; you
can't really rely on the stats collector since its info isn't very
up-to-date. That's why I was thinking of exposing the to-do lists
explicitly; comparing those with an advertised current-table would
allow accurate determination of what had just gotten done.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2007-02-27 03:05:23 Re: autovacuum next steps, take 2
Previous Message Alvaro Herrera 2007-02-27 03:00:41 Re: autovacuum next steps, take 2