Re: tuning autovacuum

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Euler Taveira de Oliveira <euler(at)timbira(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: tuning autovacuum
Date: 2011-10-14 16:45:22
Message-ID: 201110141645.p9EGjMt24767@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas wrote:
> On Wed, Jun 8, 2011 at 10:55 PM, Euler Taveira de Oliveira
> <euler(at)timbira(dot)com> wrote:
> > Em 08-06-2011 20:35, Robert Haas escreveu:
> >> Is the hint correct? ?I mean, what if there were 100 small tables that
> >> needed vacuuming all at the same time. ?We'd hit this limit no matter
> >> how high you set autovacuum_max_workers, but it wouldn't be right to
> >> set it to 101 just because every once in a blue moon you might trip
> >> over the limit.
> >>
> > I think so. You are picturing a scene with only one message. It is the same
> > case of the too-frequent-checkpoint messages; i.e., you should look if those
> > messages have some periodicity.
>
> Yeah, maybe. I'm just not sure there would be an easy way for users
> to judge when they should or should not make a change.
>
> >> I think it'd be really useful to expose some more data in this area
> >> though. ?One random idea is - remember the time at which a table was
> >> first observed to need vacuuming. Clear the timestamp when it gets
> >> vacuumed. ?Then you can do:
> >>
> > Hmmm. But this fine grained information alone doesn't help tuning the number
> > of autovacuum workers. I consider counters easier to implement and simpler
> > to analyze. But the timestamp idea has its merit because we already have a
> > similar statistic (last timestamp table was vacuumed or analyzed).
>
> Well, it won't directly tell you how many you need. But certainly if
> you see things getting further and further behind, you know you need
> more.
>
> Or, alternatively, you need to reduce vacuum_cost_delay. IME, that's
> actually the most common cause of this problem.

This thread from June died because there was concern about the overhead
of additional autovacuum statistics, and I have to say I am not
super-excited about it either because most users will not use them.

Ideally we would have something like checkpoint_warning that warns users
in the log when there are too few autovacuum workers and cleanup is
being delayed.

The big trick is how to accurately measure this. The amount of time
that a table waits to be vacuumed probably isn't relevant enough --- it
might have been days since it was last vacuumed, and waiting 10 minutes
isn't a big deal, so it is hard to say what _scale_ we would give users
for that warning that would make sense. We could compare it to the time
since the last autovacuum, but if the table is suddently heavily
modified, that doesn't help either.

I think it has to drive off of the 'n_dead_tuples' statistic value for the
table. I was toying with the idea of comparing the n_dead_tuples value
at the time the table is first scanned for autovacuum consideration, and
the value at the time an autovacuum worker actually starts scanning the
table.

The problem there is that if someone does a massive DELETE in that time
interval, or does an UPDATE on all the rows, it would think that
autovacuum should have been there to mark some dead rows, but it was
not. In the case of DELETE, having autovacuum work earlier would not
have helped, but it would have helped in the UPDATE case.

We could look at table size growth during that period. If the
autovacuum had run earlier, we would have used that dead space, but is
wasn't recorded by autovacuum yet, but again, it seems vague.

Ideas?

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2011-10-14 16:48:03 Re: Core Extensions relocation
Previous Message Tom Lane 2011-10-14 16:08:54 Re: Isolation tests still falling over routinely