Re: Autovacuum Improvements

From: Christopher Browne <cbbrowne(at)acm(dot)org>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Autovacuum Improvements
Date: 2007-01-11 03:25:44
Message-ID: 871wm2utvr.fsf@wolfe.cbbrowne.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

A long time ago, in a galaxy far, far away, nagy(at)ecircle-ag(dot)com (Csaba Nagy) wrote:
> On Mon, 2007-01-08 at 22:29, Chris Browne wrote:
> [snip]
>> Based on the three policies I've seen, it could make sense to assign
>> worker policies:
>>
>> 1. You have a worker that moves its way through the queue in some sort of
>> sequential order, based on when the table is added to the queue, to
>> guarantee that all tables get processed, eventually.
>>
>> 2. You have workers that always pull the "cheapest" tables in the
>> queue, perhaps with some sort of upper threshold that they won't go
>> past.
>>
>> 3. You have workers that alternate between eating from the two ends of the
>> queue.
>>
>> Only one queue is needed, and there's only one size parameter
>> involved.
>> Having multiple workers of type #2 seems to me to solve the problem
>> you're concerned about.
>
> This sounds better, but define "cheapest" in #2... I actually want to
> continuously vacuum tables which are small, heavily recycled
> (insert/update/delete), and which would bloat quickly. So how do you
> define the cost function for having these tables the "cheapest" ?

Cost would be based on the number of pages in the table. The smallest
tables are obviously the cheapest to vacuum.

That's separate from the policy for adding tables to the queue; THAT
would sensibly be based on the number of dead tuples; the current
policy of autovacuum seems not unreasonable...

> And how will you define the worker thread count policy ? Always 1
> worker per category, or you can define the number of threads in the
> 3 categories ? Or you still have in mind time window policies with
> allowed number of threads per worker category ? (those numbers could
> be 0 to disable a a worker category).

It would make a lot of sense to have time ranges that would indicate
when different values were wanted. Good question...

> Other thing, how will the vacuum queue be populated ? Or the "queue"
> here means nothing, all workers will always go through all tables to
> pick one based on their own criteria ? My concern here is that the
> current way of checking 1 DB per minute is not going to work with
> category #2 tables, they really have to be vacuumed continuously
> sometimes.

I think it makes considerable sense to have a queue table for this.

Having one of the threads look for new entries makes considerable
sense.

Offering the Gentle DBA the ability to add in entries based on their
special knowledge would also seem sensible.
--
(format nil "~S(at)~S" "cbbrowne" "gmail.com")
http://linuxdatabases.info/info/slony.html
Keeping instructions and operands in different memories saves .20
(.09) microseconds.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Tomas Lanczos 2007-01-11 06:49:54 Re: Moving the database from winxp to linux
Previous Message Chris 2007-01-11 01:40:12 Re: Moving the database from winxp to linux

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Glaesemann 2007-01-11 03:44:27 Re: TODO item: update source/timezone for 64-bit tz files
Previous Message L Bayuk 2007-01-11 03:06:08 Re: [PATCHES] Building libpq/psql with Borland BCC5