Re: Autovacuum integration

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: pgsql-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: Autovacuum integration
Date: 2005-07-08 19:56:25
Message-ID: 27850.1120852585@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-patches

Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> writes:
> Here is a second attempt at autovacuum integration.

A few comments:

* I strongly disagree with keeping updatable state in a catalog. In the
first place, that will cause autovac itself to create vacuumable trash.
In the second place, that means you can't modify the internal state kept
by autovacuum without forcing initdb; which is not a good situation,
considering how much change this code is likely to go through.
I think all the updatable state should be kept in the pgstats file.
(Memo to self: let's add a version number header to the pgstats file,
so that we can change its format without requiring an initdb to do so.)
pg_autovacuum should only contain user-settable parameters --- which is
still putting us at nontrivial risk for initdb constraints, but it's way
better than internal state. If you do that, then pg_autovacuum need
only contain entries for tables that have been given non-default
parameter settings (ie, those for which the user wants to override
global settings given in postgresql.conf).

* I'm pretty dubious about adding a syscache for pg_autovacuum.
Where exactly are the repeated fetches going to come from to justify
caching it?

* The signal processing needs re-work; you can't just randomly assign
signal numbers, you need to keep these things largely consistent with
the other subprocesses. In particular, if this thing is going to be
running transactions then it MUST honor SIGALRM (eg, for deadlock
timeout checks) and SIGUSR1 (sinval catchup interrupt), and I don't
think you get to ignore SIGTERM either (may get this from init).
I think it's a pretty bad idea to use SIGUSR2 for something other than
what regular backends use it for, too. (Consider the prospect that for
some reason your PID occurs in pg_listener.) It'd be a good idea to
honor SIGINT with the normal meaning (query cancel, ie, abort
transaction, which in this context would also imply process shutdown)
and use that to shut down the daemon at postmaster stop. In short,
the signal handling had better look a whole lot more like a normal
backend's.

* Speaking of shutdown, you can't stop the bgwriter until you know
that the autovac daemon is dead. In this respect too, it's much more
like a backend than like any of the other support processes. Signal it
when you signal the backends, and don't advance to the bgwriter kill
phase until autovac and all the backends are known gone.

* I see you have an autovac_init function to "annoy the user", but
shouldn't this be checked every time we are about to spawn an autovac
process?

* I don't see any special checks for shared catalogs, which means they
are probably going to be over-vacuumed; or possibly under-vacuumed if
you fail to track the update stats for them in a single place rather
than in each database. It'd probably be a good idea to nominate just
one database to be the place from which shared catalogs are vacuumed;
or treat them as a completely separate special case.

* I have no objection to adding extra entry points to vacuum.c to
simplify the calls to it.

* With respect to comments like:

+ /*
+ * We just skip a table not found on the stat hash. This is
+ * because whatever we do here, there won't be good statistics
+ * next time around, so we would do this same action again. It is
+ * tempting to issue an ANALYZE, but it won't work because ANALYZE
+ * doesn't update the stat system.
+ */

ISTM the entire point of "autovac integration" is to make sure the rest
of the system does what's needed to support autovac. If ANALYZE needs
to send something to the stats system, make it do so. (ISTM it'd be
reasonable for ANALYZE to make and report an estimate of the dead tuple
count, which after all is what you are hoping all the rest of the stats
machinery will derive for you...)

regards, tom lane

In response to

Responses

Browse pgsql-patches by date

  From Date Subject
Next Message Marko Kreen 2005-07-08 20:03:49 Re: [patch 0/2] Add Fortuna PRNG to pgcrypto
Previous Message Bruno Wolff III 2005-07-08 19:10:47 Re: [patch 0/2] Add Fortuna PRNG to pgcrypto