Re: autovacuum prioritization

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: autovacuum prioritization
Date: 2022-01-21 00:43:11
Message-ID: CA+Tgmobq6Obh+Va_yT8qY=n72AV7vwYeuHw+dce44NA-xnzCHA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 20, 2022 at 6:54 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> I agree that it doesn't follow that table A should be more of a
> priority than table B, either because it has a greater age, or because
> its age happens to exceed some actually-arbitrary threshold. But I
> will point out that my ongoing work on freezing does make something
> along these lines much more plausible. As I said over on that thread,
> there is now a kind of "natural variation" among tables, in terms of
> relfrozenxid, as a result of tracking the actual oldest XID, and using
> that (plus the emphasis on advancing relfrozenxid wherever possible).
> And so we'll have a much better idea of what's going on with each
> table -- it's typically a precise XID value from the table, from the
> recent past.

I agree.

> Since we now have the failsafe, the scheduling algorithm can afford to
> not give too much special attention to table age until we're maybe
> over the 1 billion age mark -- or even 1.5 billion+. But once the
> scheduling stuff starts to give table age special attention, it should
> probably become the dominant consideration, by far, completely
> drowning out any signals about bloat. It's kinda never really supposed
> to get that high, so when we do end up there it is reasonable to fully
> freak out. Unlike the bloat criteria, the wraparound safety criteria
> doesn't seem to have much recognizable space between not worrying at
> all, and freaking out.

I do not agree with all of this. First, on general principle, I think
sharp edges are bad. If a table had priority 0 for autovacuum 10
minutes ago, it can't now have priority one million bazillion. If
you're saying that the priority of wraparound needs to, in the limit,
become higher than any bloat-based priority, that is reasonable. Bloat
never causes a hard stop in the way that wraparound does, even if the
practical effects are not much different. However, if you're saying
that the priority should shoot up to the maximum all at once, I don't
agree with that at all. Second, I think it is good and appropriate to
leave a lot of slop in the mechanism. As you point out later, we don't
really know whether any of our estimates for how long things will take
are accurate, and therefore we don't know whether the time we've
budgeted will be sufficient. We need to leave lots of slop so that
even if we turn out to be quite wrong, we don't hit a wall.

Also, it's worth keeping in mind that waiting longer to freak out is
not necessarily an advantage. It may well be that the only way the
problem will ever get resolved is by human intervention - going in and
fixing whatever dumb thing somebody did - e.g. resolving the pending
prepared transaction. In that sense, we might be best off freaking
out after a relatively small number of transactions, because that
might get some human being's attention. In a very real sense, if old
prepared transactions shut down the system after 100 million
transactions, users would probably be better off on average, because
the problems would get fixed before so much damage is done. I'm not
seriously proposing that as a design, but I think it's a mistake to
think that pushing off the day of reckoning is necessarily better.

All that being said, I do agree that trying to keep the table age
below 300 million is too conservative. I think we need to be
conservative now because we don't take the time that the table will
take to vacuum into account, and I think if we start thinking about it
as a target to finish vacuuming rather than a target to start
vacuuming, it can go significantly higher. But I would be disinclined
to go to say, 1.5 billion. If the user hasn't taken any action when we
hit the 1 billion transaction mark, or really probably a lot sooner,
they're unlikely to wake up any time soon. I don't think there are
many systems out there where vacuum ages >1b are the result of the
system trying frantically to keep up and not having enough juice.
There are probably some, but most such cases are the result of
misconfiguration, user error, software failure, etc.

> There is a related problem that you didn't mention:
> autovacuum_max_workers controls how many autovacuum workers can run at
> once, but there is no particular concern for whether or not running
> that many workers actually makes sense, in any given scenario. As a
> general rule, the system should probably be *capable* of running a
> large number of autovacuums at the same time, but never actually do
> that (because it just doesn't ever prove necessary). Better to have
> the option and never use it than need it and not have it.

I agree. And related to that, the more workers we have, the slower
each one goes, which I think is often counterintuitive for people, and
also often counterproductive. I'm sure there are cases where table A
is really big and needs to be vacuumed but not terribly urgently, and
table B is really small but needs to be vacuumed right now, and I/O
bandwidth is really tight. In that case, slowing down the vacuum on
table A so that the vacuum on table B can do its thing is the right
call. But what I think is more common is that we get more workers
because the first one is not getting the job done. And if they all get
slower then we're still not getting the job done, but at greater
expense.

> > In the meantime, I think a sensible place to start would be to figure
> > out some system that makes sensible estimates of how soon we need to
> > address bloat, XID wraparound, and MXID wraparound for each table, and
> > some system that estimates how long each one will take to vacuum.
>
> I think that it's going to be hard to model how long index vacuuming
> will take accurately. And harder still to model which indexes will
> adversely impact the user in some way if we delay vacuuming some more.

Those are fair concerns. I assumed that if we knew the number of pages
in the index, which we do, it wouldn't be too hard to make an estimate
like this ... but you know more about this than I do, so tell me why
you think that won't work. It's perhaps worth noting that even a
somewhat poor estimate could be a big improvement over what we have
now.

> Might be more useful to start off by addressing how to spread out the
> burden of vacuuming over time. The needs of queries matters, but
> controlling costs matters too.
>
> One of the most effective techniques is to manually VACUUM when the
> system is naturally idle, like at night time. If that could be
> quasi-automated, or if the criteria used by autovacuum scheduling gave
> just a little weight to how busy the system is right now, then we
> would have more slack when the system becomes very busy.

I have thought about this approach but I'm not very hopeful about it
as a development direction. One problem is that we don't necessarily
know when the quiet times are, and another is that there might not
even be any quiet times. Still, neither of those problems by itself
would discourage me from attempting something in this area. The thing
that does discourage me is: if you have a quiet period, you can take
advantage of that to do vacuuming without any code changes at all.
You can just crontab a vacuum that runs with a reduced setting for
vacuum_freeze_table_age and vacuum_freeze_min_age during your nightly
quiet period and call it good.

The problem that I'm principally concerned about here is the case
where somebody had a system that was basically OK and then at some
point, bad things started to happen. At some point they realize
they're in trouble and try to get back on track. Very often,
autovacuum is actually the enemy in that situation: it insists on
consuming resources to vacuum the wrong stuff. Whatever we can do to
avoid such disastrous situations is all to the good, but since we
can't realistically expect to avoid them entirely, we need to improve
the behavior in the cases where they do happen.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2022-01-21 02:18:27 Re: Refactoring of compression options in pg_basebackup
Previous Message samay sharma 2022-01-21 00:29:34 Re: New developer papercut - Makefile references INSTALL