Re: [HACKERS] Autovacuum Improvements

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Darcy Buskermolen <darcyb(at)commandprompt(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, "Matthew T(dot) O'Connor" <matthew(at)zeut(dot)net>, Pavan Deolasee <pavan(at)enterprisedb(dot)com>, Christopher Browne <cbbrowne(at)acm(dot)org>, pgsql-general(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] Autovacuum Improvements
Date: 2007-01-19 21:36:50
Message-ID: 200701192136.l0JLao710431@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers


Added to TODO:

> o Allow multiple vacuums so large tables do not starve small
> tables
>
> http://archives.postgresql.org/pgsql-general/2007-01/msg00031.php
>
> o Improve control of auto-vacuum
>
> http://archives.postgresql.org/pgsql-hackers/2006-12/msg00876.php

---------------------------------------------------------------------------

Darcy Buskermolen wrote:
> On Friday 19 January 2007 01:47, Simon Riggs wrote:
> > On Tue, 2007-01-16 at 07:16 -0800, Darcy Buskermolen wrote:
> > > On Tuesday 16 January 2007 06:29, Alvaro Herrera wrote:
> > > > elein wrote:
> > > > > Have you made any consideration of providing feedback on autovacuum
> > > > > to users? Right now we don't even know what tables were vacuumed when
> > > > > and what was reaped. This might actually be another topic.
> > > >
> > > > I'd like to hear other people's opinions on Darcy Buskermolen proposal
> > > > to have a log table, on which we'd register what did we run, at what
> > > > time, how long did it last, how many tuples did it clean, etc. I feel
> > > > having it on the regular text log is useful but it's not good enough.
> > > > Keep in mind that in the future we may want to peek at that collected
> > > > information to be able to take better scheduling decisions (or at least
> > > > inform the DBA that he sucks).
> > > >
> > > > Now, I'd like this to be a VACUUM thing, not autovacuum. That means
> > > > that manually-run vacuums would be logged as well.
> > >
> > > Yes I did intend this thought for vacuum, not strictly autovacuum.
> >
> > I agree, for all VACUUMs: we need a log table.
> >
> > The only way we can get a feedback loop on what has come before is by
> > remembering what happened. Simply logging it is interesting, but not
> > enough.
>
> Correct, I think we are all saying the same thing that is this log table is
> purely inserts so that we can see trends over time.
>
> >
> > There is some complexity there, because with many applications a small
> > table gets VACUUMed every few minutes, so the log table would become a
> > frequently updated table itself. I'd also suggest that we might want to
> > take account of the number of tuples removed by btree pre-split VACUUMs
> > also.
>
> Thinking on this a bit more, I suppose that this table really should allow for
> user defined triggers on it, so that a DBA can create partioning for it, not
> to mention being able to move it off into it's own tablespace.
>
>
> >
> > I also like the idea of a single scheduler and multiple child workers.
> >
> > The basic architecture is clear and obviously beneficial. What worries
> > me is how the scheduler will work; there seems to be as many ideas as we
> > have hackers. I'm wondering if we should provide the facility of a
> > pluggable scheduler? That way you'd be able to fine tune the schedule to
> > both the application and to the business requirements. That would allow
> > integration with external workflow engines and job schedulers, for when
> > VACUUMs need to not-conflict with external events.
> >
> > If no scheduler has been defined, just use a fairly simple default.
> >
> > The three main questions are
> > - what is the maximum size of VACUUM that can start *now*
>
> How can we determine this given we have no real knowledge of the upcoming
> adverse IO conditions ?
>
> > - can *this* VACUUM start now?
> > - which is the next VACUUM to run?
> >
> > If we have an API that allows those 3 questions to be asked, then a
> > scheduler plug-in could supply the answers. That way any complex
> > application rules (table A is available for VACUUM now for next 60 mins,
> > table B is in constant use so we must use vacuum_delay), external events
> > (long running reports have now finished, OK to VACUUM), time-based rules
> > (e.g. first Sunday of the month 00:00 - 04:00 is scheduled downtime,
> > first 3 days of the each month is financial accounting close) can be
> > specified.
>
> Another thought, is it at all possible to do a partial vacuum? ie spend the
> next 30 minutes vacuuming foo table, and update the fsm with what hew have
> learned over the 30 mins, even if we have not done a full table scan ?
>
>
> --
>
>
> Darcy Buskermolen
> The PostgreSQL company, Command Prompt Inc.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
> choose an index scan if your joining column's datatypes do not
> match

--
Bruce Momjian bruce(at)momjian(dot)us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Jan Muszynski 2007-01-19 21:53:50 Multiple column index usage question
Previous Message Devrim GUNDUZ 2007-01-19 21:31:38 Re: Postgresql 8.0.10 on FC4

Browse pgsql-hackers by date

  From Date Subject
Next Message Gavin Sherry 2007-01-19 21:53:09 Re: Planning aggregates which require sorted or distinct
Previous Message Karen Hill 2007-01-19 20:15:19 Re: Planning aggregates which require sorted or distinct