Re: [HACKERS] Autovacuum Improvements

From: Darcy Buskermolen <darcyb(at)commandprompt(dot)com>
To: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc: "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, "Matthew T(dot) O'Connor" <matthew(at)zeut(dot)net>, "Pavan Deolasee" <pavan(at)enterprisedb(dot)com>, "Christopher Browne" <cbbrowne(at)acm(dot)org>, pgsql-general(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] Autovacuum Improvements
Date: 2007-01-19 14:48:26
Message-ID: 200701190648.27950.darcyb@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

On Friday 19 January 2007 01:47, Simon Riggs wrote:
> On Tue, 2007-01-16 at 07:16 -0800, Darcy Buskermolen wrote:
> > On Tuesday 16 January 2007 06:29, Alvaro Herrera wrote:
> > > elein wrote:
> > > > Have you made any consideration of providing feedback on autovacuum
> > > > to users? Right now we don't even know what tables were vacuumed when
> > > > and what was reaped. This might actually be another topic.
> > >
> > > I'd like to hear other people's opinions on Darcy Buskermolen proposal
> > > to have a log table, on which we'd register what did we run, at what
> > > time, how long did it last, how many tuples did it clean, etc. I feel
> > > having it on the regular text log is useful but it's not good enough.
> > > Keep in mind that in the future we may want to peek at that collected
> > > information to be able to take better scheduling decisions (or at least
> > > inform the DBA that he sucks).
> > >
> > > Now, I'd like this to be a VACUUM thing, not autovacuum. That means
> > > that manually-run vacuums would be logged as well.
> >
> > Yes I did intend this thought for vacuum, not strictly autovacuum.
>
> I agree, for all VACUUMs: we need a log table.
>
> The only way we can get a feedback loop on what has come before is by
> remembering what happened. Simply logging it is interesting, but not
> enough.

Correct, I think we are all saying the same thing that is this log table is
purely inserts so that we can see trends over time.

>
> There is some complexity there, because with many applications a small
> table gets VACUUMed every few minutes, so the log table would become a
> frequently updated table itself. I'd also suggest that we might want to
> take account of the number of tuples removed by btree pre-split VACUUMs
> also.

Thinking on this a bit more, I suppose that this table really should allow for
user defined triggers on it, so that a DBA can create partioning for it, not
to mention being able to move it off into it's own tablespace.

>
> I also like the idea of a single scheduler and multiple child workers.
>
> The basic architecture is clear and obviously beneficial. What worries
> me is how the scheduler will work; there seems to be as many ideas as we
> have hackers. I'm wondering if we should provide the facility of a
> pluggable scheduler? That way you'd be able to fine tune the schedule to
> both the application and to the business requirements. That would allow
> integration with external workflow engines and job schedulers, for when
> VACUUMs need to not-conflict with external events.
>
> If no scheduler has been defined, just use a fairly simple default.
>
> The three main questions are
> - what is the maximum size of VACUUM that can start *now*

How can we determine this given we have no real knowledge of the upcoming
adverse IO conditions ?

> - can *this* VACUUM start now?
> - which is the next VACUUM to run?
>
> If we have an API that allows those 3 questions to be asked, then a
> scheduler plug-in could supply the answers. That way any complex
> application rules (table A is available for VACUUM now for next 60 mins,
> table B is in constant use so we must use vacuum_delay), external events
> (long running reports have now finished, OK to VACUUM), time-based rules
> (e.g. first Sunday of the month 00:00 - 04:00 is scheduled downtime,
> first 3 days of the each month is financial accounting close) can be
> specified.

Another thought, is it at all possible to do a partial vacuum? ie spend the
next 30 minutes vacuuming foo table, and update the fsm with what hew have
learned over the 30 mins, even if we have not done a full table scan ?

--

Darcy Buskermolen
The PostgreSQL company, Command Prompt Inc.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Scott Ribe 2007-01-19 15:08:39 Re: PG not rejecting bad dates (was Re: Finding bogus
Previous Message Martijn van Oosterhout 2007-01-19 14:47:53 Re: Password encryption method

Browse pgsql-hackers by date

  From Date Subject
Next Message Stefan Kaltenbrunner 2007-01-19 15:25:17 Re: Windows buildfarm failures
Previous Message Bruce Momjian 2007-01-19 13:56:48 Re: [HACKERS] [PATCHES] [BUGS] BUG #2846: inconsistent and