Re: Autovacuum in the backend

From: "Matthew T(dot) O'Connor" <matthew(at)zeut(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)surnet(dot)cl>, Josh Berkus <josh(at)agliodbs(dot)com>, Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Autovacuum in the backend
Date: 2005-06-16 16:54:39
Message-ID: 42B1AECF.9080005@zeut.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

Tom Lane wrote:

>Alvaro Herrera <alvherre(at)surnet(dot)cl> writes:
>
>
>>Now, I'm hearing people don't like using libpq.
>>
>>
>
>Yeah --- a libpq-based solution is not what I think of as integrated at
>all, because it cannot do anything that couldn't be done by the existing
>external autovacuum process. About all you can buy there is having the
>postmaster spawn the autovacuum process, which is slightly more
>convenient to use but doesn't buy any real new functionality.
>
>

Yes libpq has to go, I thought this was clear, but perhaps I didn't say
it clearly enough. Anyway, this was the stumbling block which prevented
me from making more progress on autovacuum integration.

>>Some people say "keep it simple and have one process per cluster." I
>>think they don't realize it's actually more complex, not the other way
>>around.
>>
>>
>
>A simple approach would be a persistent autovac background process for
>each database, but I don't think that's likely to be acceptable because
>of the amount of resources tied up (PGPROC slots, open files, etc).
>
>

Agreed, this seems ugly.

>One thing that might work is to have the postmaster spawn an autovac
>process every so often. The first thing the autovac child does is pick
>up the current statistics dump file (which it can find without being
>connected to any particular database). It looks through that to
>determine which database is most in need of work, then connects to that
>database and does some "reasonable" amount of work there, and finally
>quits. Awhile later the postmaster spawns another autovac process that
>can connect to a different database and do work there.
>
>

I don't think you can use a dump to determine who should be connected to
next since you don't really know what happened since the last time you
exited. What was a priority 5 or 10 minutes ago might not be a priority
now.

>This design would mean that the autovac process could not have any
>long-term state of its own: any long-term state would have to be in
>either system catalogs or the statistics. But I don't see that as
>a bad thing really --- exposing the state will be helpful from a
>debugging and administrative standpoint.
>

This is not a problem as my patch, that Alvaro has now taken over,
already created a new system catalog for all autovac data, so autovac
really doesn't contain any static persistent data.

The rough design I had in mind was:
1) On startup postmaster spawns the master autovacuum process
2) The master autovacuum process spawns backends to do the vacuuming
work on a particular database
3) The master autovacuum waits for this process to exit, then spaws the
next backend for the next database
4) Repeat this loop until all databases in the cluster have been
checked, then sleep for a while, and start over again.

I'm not sure if this is feasible, or if this special master autovacuum
process would be able to fork off or request that the postmaster fork
off an autovacuum process for a particular database in the cluster.
Thoughts or comments?

Matthew

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Matthew T. O'Connor 2005-06-16 17:04:13 Re: pgavd status
Previous Message Brandon Metcalf 2005-06-16 16:48:02 Re: pgavd status

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruno Wolff III 2005-06-16 17:08:19 Re: Proposal - Continue stmt for PL/pgSQL
Previous Message Josh Berkus 2005-06-16 16:40:16 Re: Proposal - Continue stmt for PL/pgSQL