Re: Autovacuum improvements

From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: "Matthew T(dot) O'Connor" <matthew(at)zeut(dot)net>
Cc: Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Autovacuum improvements
Date: 2007-01-14 20:27:19
Message-ID: 20070114202719.GC7233@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

Matthew T. O'Connor wrote:

> Alvaro Herrera wrote:
> >I've been thinking how to improve autovacuum so that we can convince
> >more people that it can be enabled by default.
>
> I would like to see it enabled by default too, however the reason it
> isn't already enabled by default is that it caused failures in the
> regression test when we tried to turn it on during the 8.2 dev cycle and
> it was too close to beta to fix everything. All this new machinery is
> great, but it doesn't address that problem.

See my reply to Tom on this topic.

> >pg_av_igroupmembers
> > groupid oid
> > month int
> > dom int
> > dow int
> > starttime timetz
> > endtime timetz
>
> This seems to assume that the start and end time for an interval will be
> on the same day, you probably need to specify a start month, dom, dow,
> time and an end month, dom, dow and time.

Actually, I was thinking that if you want intervals that cross day
boundaries, you just add more tuples (one which finishes at 23:59:59 and
another which starts at 00:00:00 the next day).

> Since this is modeled after cron, do we allow wild-cards, or any of the
> other cron tricks like */20 or 1-3,5,7,9-11?

Wildcards yes (using NULL), but not the rest because it would make the
autovacuum code responsible for parsing the values which I don't think
is a good idea. And it's not normalized anyway.

> Also your notation above is ambiguous, it took me a while to realize
> that pg_av_igroupmembers.groupid wasn't referencing the id from
> pg_av_tablegroup.

Hmm, yeah, that one is referencing pg_av_intervalgroup.

> >pg_av_schedule
> > tgroup oid
> > igroup oid
> > enabled bool
> > queue int
> > vac_base_thresh int
> > vac_scale_factor float
> > anl_base_thresh int
> > anl_scal_factor float
> > vac_cost_delay int
> > vac_cost_limit int
> > freeze_min_age int
> > freeze_max_age int
> >
>
> What is queue for?

Sorry, that was part of the queue stuff which I then deleted :-)

> >So the scheduler, at startup, loads the whole schedule in memory, and
> >then wakes up at reasonable intervals and checks whether these equations
> >hold for some of the tables it's monitoring. If they do, then launch a
> >new worker process to do the job.
> >
> >We need a mechanism for having the scheduler rescan the schedule when a
> >user modifies the catalog -- maybe having a trigger that sends a signal
> >to the process is good enough (implementation detail: the signal must be
> >routed via the postmaster, since the backend cannot hope to know the
> >scheduler's PID. This is easy enough to do.)
>
> This all looks reasonable if not a bit complex. Question, what happens
> to the current pg_autovacuum relation?

I had two ideas: one was to make pg_autovacuum hold default config for
all tables not mentioned in any group, so sites which are OK with 8.2's
representation can still use it. The other idea was to remove it and
replace it with this mechanism.

> Also what about system defaults, will we have a hard coded default
> interval of always on, and one default table group that contains all the
> tables with one default entry in pg_av_schedule?

Yes, that's what I had in mind.

> I think we need more discussion on scheduling, we need to make sure this
> solves the vacuum starvation problem. Does the launcher process
> consider each row in pg_av_schedule that applies at the current time
> separately? That is say there are three entries in pg_av_schedule that
> apply right now, does that mean that the launcher can fire off three
> different vacuums? Perhaps we need to add a column to pg_av_tablegroup
> that specifies the max number of concurrent worker processes for this
> table group.

My idea was to assign each table, or maybe each group, to a queue, and
then have as much workers as there are queues. So you could put them
all in a single queue and it would mean there can be at most one vacuum
running at any time. Or you could put each group in a queue, and then
there could be as many workers as there are groups. Or you could mix.

And also there would be a "autovac concurrency limit", which would be
a GUC var saying how many vacuums to have at any time.

> Also, I don't think we need the concept of queues as described in recent
> threads. I think the idea of the queues was the the system would be
> able to automatically find small tables and vacuum them frequently, in
> this proposal the admin would have to create a group for small tables
> and manually add tables to the group and make sure that there are enough
> worker processes for that group to prevent vacuum starvation. Perhaps
> we can create a dynamic group that includes all tables with less than a
> certain number of rows or blocks?

Yeah, my idea of "queues" was slightly different than the queues that
were being discussed. I was thinking that queues would just be a means
to group the groups to limit concurrency while at the same time prevent
starvation.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2007-01-14 20:55:43 Re: [PATCHES] vcbuild optional packages
Previous Message Tom Lane 2007-01-14 20:19:16 Re: Autovacuum improvements

Browse pgsql-patches by date

  From Date Subject
Next Message Alvaro Herrera 2007-01-14 20:55:43 Re: [PATCHES] vcbuild optional packages
Previous Message Tom Lane 2007-01-14 20:19:16 Re: Autovacuum improvements