Re: Allow to specify (auto-)vacuum cost limits relative to the database/cluster size?

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Allow to specify (auto-)vacuum cost limits relative to the database/cluster size?
Date: 2016-02-24 06:23:30
Message-ID: CA+TgmoY9VL6o4i82RbJp7gzDt5kPApTeWzjTecCBf+pqJu3=2A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 12, 2016 at 6:12 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> right now the defaults for autovacuum cost limiting are so low that they
> regularly cause problems for our users. It's not exactly obvious that
> any installation above a couple gigabytes definitely needs to change
> autovacuum_vacuum_cost_delay &
> autovacuum_vacuum_cost_limit/vacuum_cost_limit. Especially
> anti-wraparound/full table vacuums basically take forever with the
> default settings.
>
> On the other hand we don't want a database of a couple hundred megabytes
> to be vacuumed as fast as possible and trash the poor tiny system. So we
> can't just massively increase the limits by default; although I believe
> some default adjustment would be appropriate anyway.
>
> I wonder if it makes sense to compute the delays / limits in relation to
> either cluster or relation size. If you have a 10 TB table, you
> obviously don't want to scan with a few megabytes a second, which the
> default settings will do for you. With that in mind we could just go for
> something like the autovacuum_*_scale_factor settings. But e.g. for
> partitioned workloads with a hundreds of tables in the couple gigabyte
> range that'd not work that well.
>
> Somehow computing the speed in relation to the cluster/database size is
> probably possible, but I wonder how we can do so without constantly
> re-computing something relatively expensive?
>
> Thoughts?

Thanks for bringing this up. I fully agree we should try to do
something about this. This comes up quite regularly in EnterpriseDB
support discussions, and I'm sure lots of other people have problems
with it too. It seems to me that what we really want to do is try to
finish vacuuming the table before we again need to vacuum the table.
For the sake of simplicity, just consider the anti-wraparound case for
a second. If it takes three days to vacuum the table and we consume
200 million XIDs in two days, we are pretty clearly not vacuuming fast
enough.

I think we should do something similar to what we do for checkpoints.
We estimate when the table will next need vacuuming based on the rate
of XID advancement and the rate at which dead tuples are being
created. We can also estimate what percentage of the relation we've
vacuumed and derive some estimate of when we'll be done - perhaps
assuming only one index pass, for the sake of simplicity. If we're
behind, we should vacuum faster to try to catch up. We could even try
to include some fudge factor in the calculation - e.g. if the time
until the next vacuum is estimated to be 30 hours from the start of
the current vacuum, we try to make the current vacuum finish in no
more than 75% * 30 hours = 22.5 hours.

I think this is better than your proposal to scale it just based on
the size of the relation because it may be find for the vacuum to run
slowly if we're creating very few dead tuples and consuming very few
XIDs. IME, there's one very specific scenario where the wheels come
off, and that's when the table doesn't get fully vacuumed before it's
due to be vacuumed again. Of course, anything we did here wouldn't be
perfect - it would all be based on estimates - but I bet we could make
things a lot better. There's an even more global version of this
problem, which is that you could have a situation when any given table
gets vacuumed it runs quick enough to finish before that table gets
vacuumed again, but there are lots of large tables so overall we don't
make enough progress. It would be nice to fix that, too, but even
something simple that ignored that more global problem would help a
lot of people.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2016-02-24 07:06:10 Re: silent data loss with ext4 / all current versions
Previous Message Robert Haas 2016-02-24 06:05:35 Re: significant semi join overestimates (with CTEs)