Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Jeff Davis <pgsql(at)j-davis(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation
Date: 2023-01-18 15:54:19
Message-ID: CA+TgmobX9c5_1rR3CKQ_-y=AkFd_MP-XxS5SHGSX=awMFz4XdQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 17, 2023 at 5:56 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> > Why do you think that?
>
> For the reasons I gave about statistics, random sampling, the central
> limit theorem. All that stuff. This matches the experience of Andres.
> And is obviously the only explanation behind the reliance on
> antiwraparound autovacuums for cleaning up bloat in larger OLTP
> databases. It just fits: the dead tuples approach can sometimes be so
> completely wrong that even an alternative triggering condition based
> on something that is virtually unrelated to the thing we actually care
> about can do much better in practice. Consistently, reliably, for a
> given table/workload.

Hmm, I don't know. I have no intuition one way or the other for
whether we're undercounting dead tuples, and I don't understand what
would cause us to do that. I thought that we tracked that accurately,
as part of the statistics system, not by sampling
(pg_stat_all_tables.n_dead_tup).

But, I think there are a number of other explanations for why we tend
to rely on antiwraparound vacuums more than we should.
Auto-cancellation. Skipping tables that are locked, or pages that are
pinned. A cost limit that is too low relative to the size of the
database, so that eventually all tables are in wraparound danger all
the time. The fact that we can vacuum tables uselessly, without
accomplishing anything, because the XID horizon is too new, but we
don't know that so we just try to vacuum anyway. And then we repeat
that useless work in an infinite loop. The fact that the system's idea
of when a vacuum needs to happen grows with
autovacuum_vacuum_scale_factor, but that actually gets too big too
fast, so that eventually it never triggers vacuuming at all, or at
least not before XID age does.

I think we ought to fire autovacuum_vacuum_scale_factor out of an
airlock. It's not the right model, and I think many people have been
aware that it's not the right model for a decade, and we haven't been
creative enough to come up with anything better. We *know* that you
have to lower this value for large tables or they just don't get
vacuumed often enough. That means we have some idea how often they
ought to be vacuumed. I'm sure I'm not the person who has the best
intuition on that point, but I bet people who have been responsible
for large production systems have some decent ideas in that area. We
should find out what that intuition is and come up with a new formula
that matches the intuition of people with experience doing this sort
of thing.

e.g.

1. When computing autovacuum_vacuum_threshold + table_size *
autovacuum_vacuum_scale_factor, if the result exceeds the value of a
new parameter autovacuum_vacuum_maximum_threshold, then clamp the
result to that value.

2. When computing autovacuum_vacuum_threshold + table_size *
autovacuum_vacuum_scale_factor, if the result exceeds 80% of the
number of dead TIDs we could store, clamp it to that number.

3. Change the formula altogether to use a square root or a cube root
or a logarithm somewhere.

I think we also ought to invent some sort of better cost limit system
that doesn't shoot you in the foot automatically as the database
grows. Nobody actually wants to limit the rate at which the database
vacuums stuff to a constant. What they really want to do is limit it
to a rate that is somewhat faster than the minimum rate needed to
avoid disaster. We should try to develop metrics for whether vacuum is
keeping up. I think one good one would be duty cycle -- if we have N
vacuum workers, then over a period of K seconds we could have done as
much as N*K process-seconds of vacuum work, and as little as 0. So
figure out how many seconds of vacuum work we actually did, and divide
that by N*K to get a percentage. If it's over, say, 90%, then we are
not keeping up. We should dynamically raise the cost limit until we
do. And drop it back down later when things look better.

I don't actually see any reason why dead tuples, even counted in a
relatively stupid way, isn't fundamentally good enough to get all
tables vacuumed before we hit the XID age cutoff. It doesn't actually
do that right now, but I feel like that must be because we're doing
other stupid things, not because there's anything that terrible about
the metric as such. Maybe that's wrong, but I find it hard to imagine.
If I imagine a world where vacuum always gets started when the number
of dead tuples hits some reasonable bound (rather than the
unreasonable bound that the scale factor stuff computes) and it always
cleans up those dead tuples (instead of doing a lot of work to clean
up nothing at all, or doing a lot of work to clean up only a small
fraction of those dead tuples, or cancelling itself, or skipping the
table that has the problem because it's locked, or running with an
unreasonably low cost limit, or otherwise being unable to GET THE JOB
DONE) then how do we ever reach autovacuum_freeze_max_age? I think it
would still be possible, but only if the XID consumption rate of the
server is so high that we chunk through 300 million XIDs in the time
it takes to perform an un-throttled vacuum of the table. I think
that's a real threat and will probably be a bigger one in ten years,
but it's only one of many things that are going wrong right now.

> Even if we were omniscient, we still wouldn't be omnipotent.

A sound theological point!

> We're
> still subject to the laws of physics. VACUUM would still be something
> that more or less works at the level of the whole table, or not at
> all. So being omniscient seems kinda overrated to me. Adding more
> information does not in general lead to better outcomes.

Yeah, I think that's true. In particular, it's not much use being
omniscient but stupid. It would be better to have limited information
and be smart about what you did with it.

> What I'm arguing is that principally relying on any one thing is
> garbage. If you have only one thing that creates pressure to VACUUM
> then there can be a big impact whenever it turns out to be completely
> wrong. Whereas if VACUUM can run because of (say) 3 moderate signals
> taken together, then it's much less likely that we'll be completely
> wrong. In general my emphasis is on avoiding disaster in all its
> forms. Vacuuming somewhat early more often is perhaps suboptimal, but
> far from a disaster. It's the kind of thing that we can manage.

True, although it can be overdone. An extra vacuum on a big table with
some large indexes that end up getting scanned can be very expensive
even if the table itself is almost entirely all-visible. We can't
afford to make too many mistakes in the direction of vacuuming early
in such cases.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David G. Johnston 2023-01-18 15:55:28 Re: [DOCS] Stats views and functions not in order?
Previous Message Karl O. Pinc 2023-01-18 15:50:12 Re: Doc: Rework contrib appendix -- informative titles, tweaked sentences