Re: Autovacuum in the backend

From: Hannu Krosing <hannu(at)skype(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)surnet(dot)cl>, "Matthew T(dot) O'Connor" <matthew(at)zeut(dot)net>, Josh Berkus <josh(at)agliodbs(dot)com>, Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Autovacuum in the backend
Date: 2005-06-16 18:42:13
Message-ID: 42B1C805.507@skype.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers


On N, 2005-06-16 at 11:42 -0400, Tom Lane wrote:
> Alvaro Herrera <alvherre(at)surnet(dot)cl> writes:
...
> > Some people say "keep it simple and have one process per cluster." I
> > think they don't realize it's actually more complex, not the other way
> > around.
>
> Agreed. If you aren't connected to a specific database, then you cannot
> use any of the normal backend infrastructure for catalog access, which
> is pretty much a killer limitation.
>
> A simple approach would be a persistent autovac background process for
> each database, but I don't think that's likely to be acceptable because
> of the amount of resources tied up (PGPROC slots, open files, etc).

In this case it should also be configurable, which databases will get
their own AV processes.

Also, there is probably no need to keep an AV process running very long
after last "real" backend for that database has closed, as there won't
be any changes anyway.

Having one AV process per DB will likely be a problem for only
installations, where there is very many single-user user-always-
connected databases, which I don't expect to be that many.

And I also expect that soon (after my vacuums-dont-step-on-each-other
patch goes in), there will be need for running several vacuums in
parallel on the same database (say one with non-intrusive vacuum_page
settings for a really large table and several more agressive ones for
fast-changing small tables at the same time), AFAIKS this will also need
several backends - at least one for each parallel vacuum.

> One thing that might work is to have the postmaster spawn an autovac
> process every so often.

my fastest manual vacuum does its job in 5 sec and is repeated at 10 sec
inervals - will this design be able to match this ?

> The first thing the autovac child does is pick
> up the current statistics dump file (which it can find without being
> connected to any particular database). It looks through that to
> determine which database is most in need of work, then connects to that
> database and does some "reasonable" amount of work there, and finally
> quits. Awhile later the postmaster spawns another autovac process that
> can connect to a different database and do work there.
>
> This design would mean that the autovac process could not have any
> long-term state of its own: any long-term state would have to be in
> either system catalogs or the statistics. But I don't see that as
> a bad thing really --- exposing the state will be helpful from a
> debugging and administrative standpoint.

--
Hannu Krosing <hannu(at)skype(dot)net>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Zlatko Matic 2005-06-16 19:51:03 Re: pg_dumpall
Previous Message Collin Peters 2005-06-16 18:27:43 Extremely slow performance with 'select *' after insert of 37,000 records

Browse pgsql-hackers by date

  From Date Subject
Next Message Jonah H. Harris 2005-06-16 18:51:45 Re: Proposal - Continue stmt for PL/pgSQL
Previous Message Andrew Dunstan 2005-06-16 17:47:16 Re: Proposal - Continue stmt for PL/pgSQL