Re: Autovacuum Improvements

From: Chris Browne <cbbrowne(at)acm(dot)org>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Autovacuum Improvements
Date: 2007-01-08 21:29:51
Message-ID: 60irfh6wc0.fsf@dba2.int.libertyrms.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

nagy(at)ecircle-ag(dot)com (Csaba Nagy) writes:
> On Sun, 2006-12-24 at 03:03, Christopher Browne wrote:
> [snip]
>> Seems to me that you could get ~80% of the way by having the simplest
>> "2 queue" implementation, where tables with size < some threshold get
>> thrown at the "little table" queue, and tables above that size go to
>> the "big table" queue.
>
> That would most definitely not cut it for me, I have more than 2
> categories of tables:
>
> - a few small but very often updated/inserted/deleted table: these must
> be continuously vacuumed, your "little queue" is not good enough for
> that, as even the round trip between the small tables could lead to
> bloat on them;

I disagree; if we added more "work processes," that could eat quickly
through the short end of the queue.

> - a few small and moderately updated, that could live with the "little
> queue";
>
> - a few big and frequently updated, but which only have a small
> percentage of rows actively updated at any time: those could live with
> the big queue;
>
> - the rest which are rarely updated, I would put those in a separate
> queue so they won't affect the rest, cause vacuuming them is really
> mostly not critical;
>
> The point is that I'm not sure there couldn't be even more reasons
> to split the tables in even more queues based on the importance of
> vacuuming them combined with update rate and their size. If I can
> set up my own queues I can experiment with what works best for
> me... for the base setup you could set up some default queues. I
> wonder though how would you handle dynamics of tables, I mean when
> will a small table which grows start to be considered a big table
> for the purpose of putting it in one queue or the other ? I guess it
> would be done on analyzing the table, which is also handled by
> autovacuum, so tables with no vacuum queue settings could go to one
> of the 2 default queues you mention.

The heuristic I was thinking of didn't involve having two queues, but
rather just 1. By having some size information, work processes could
eat at the queue from both ends.

If you have cases where tables need to be vacuumed *really*
frequently, then you make sure that they are being injected
frequently, and that some of the workers are tied to Just Doing Small
Tables.

I think that *does* cover your scenario quite adequately, and without
having to get into having a bunch of queues.

The heuristic is incomplete in one other fashion, namely that it
doesn't guarantee that tables in the middle will ever get "gotten to."
That mandates having a third policy, namely to have a worker that goes
through tables in the (singular) queue some form of chronological
order.

>> That should keep any small tables from getting "vacuum-starved."
>>
>> I'd think the next step would be to increase the number of queues,
>> perhaps in a time-based fashion. There might be times when it's
>> acceptable to vacuum 5 tables at once, so you burn thru little tables
>> "like the blazes," and handle larger ones fairly promptly. And other
>> times when you don't want to do *any* big tables, and limit a single
>> queue to just the itty bitty ones.
>
> This is all nice and it would be cool if you could set it up per vacuum
> queue. I mean how much more effort would be to allow vacuum queues with
> generic settings like time windows with max number of threads for each
> window, and let the user explicitly assign tables to those queues,
> instead of hard coding the queues and their settings and assign tables
> to them based on size or any other heuristics ?
>
> For the average application which needs simple settings, there could be
> a default setup with the 2 queues you mention. If it would be possible
> to set up some rules to assign tables to queues based on their
> properties on analyze time, instead of explicitly assigning to one queue
> or other, that would be nice too, and then you can completely cover the
> default setup with those settings, and allow for more complex setups for
> those who need it.

My thinking has headed more towards simplifying this; two queues seems
to be one too many :-).

>> This approach allows you to stay mostly heuristic-based, as opposed to
>> having to describe policies in gratuitous detail.
>
> I agree that for simple setups that would be OK, but like I said, if it
> would be easy enough to code that heuristics, and provide some sane
> setup as default, and then let the user optimize it, that would be a
> cool solution.
>
> Now it's true I don't really know how would you code 'assign all tables
> which are smaller than x rows to vacuum queue "little-tables"' ... maybe
> by providing a function to the queue which "matches" on the table ? And
> you can change that function ? No idea, but it probably can be done...

Based on the three policies I've seen, it could make sense to assign
worker policies:

1. You have a worker that moves its way through the queue in some sort of
sequential order, based on when the table is added to the queue, to
guarantee that all tables get processed, eventually.

2. You have workers that always pull the "cheapest" tables in the
queue, perhaps with some sort of upper threshold that they won't go
past.

3. You have workers that alternate between eating from the two ends of the
queue.

Only one queue is needed, and there's only one size parameter
involved.

Having multiple workers of type #2 seems to me to solve the problem
you're concerned about.
--
(format nil "~S(at)~S" "cbbrowne" "cbbrowne.com")
http://cbbrowne.com/info/spiritual.html
<a href="http://www.netizen.com.au/">thorfinn(at)netizen(dot)com(dot)au</a>
Millihelen, adj:
The amount of beauty required to launch one ship.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Raymond O'Donnell 2007-01-08 21:32:39 Re: Slony in Windows installer?
Previous Message Brandon Aiken 2007-01-08 21:24:51 Re: Database Failure on Windows XP Pro psql (PostgreSQL)

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim C. Nasby 2007-01-08 21:33:42 Re: -f <output file> option for pg_dumpall
Previous Message Magnus Hagander 2007-01-08 20:33:06 Re: README for vcbuild