Re: New IndexAM API controlling index vacuum strategies

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Noah Misch <noah(at)leadboat(dot)com>
Subject: Re: New IndexAM API controlling index vacuum strategies
Date: 2021-03-18 18:36:41
Message-ID: CAH2-Wz=Ouqe8bXUYVhROD7JiW-zvhh5mApF=dbRWg1Ta3m2gCg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 18, 2021 at 3:32 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> If we have the constant threshold of 1 billion transactions, a vacuum
> operation might not be an anti-wraparound vacuum and even not be an
> aggressive vacuum, depending on autovacuum_freeze_max_age value. Given
> the purpose of skipping index vacuuming in this case, I think it
> doesn't make sense to have non-aggressive vacuum skip index vacuuming
> since it might not be able to advance relfrozenxid. If we have a
> constant threshold, 2 billion transactions, maximum value of
> autovacuum_freeze_max_age, seems to work.

I like the idea of not making the behavior a special thing that only
happens with a certain variety of VACUUM operation (non-aggressive or
anti-wraparound VACUUMs). Just having a very high threshold should be
enough.

Even if we're not going to be able to advance relfrozenxid, we'll
still finish much earlier and let a new anti-wraparound vacuum take
place that will do that -- and will be able to reuse much of the work
of the original VACUUM. Of course this anti-wraparound vacuum will
also skip index vacuuming from the start (whereas the first VACUUM may
well have done some index vacuuming before deciding to end index
vacuuming to hurry with finishing).

There is a risk in having the limit be too high, though. We need to
give VACUUM time to reach two_pass_strategy() to notice the problem
and act (maybe each call to lazy_vacuum_all_indexes() takes a long
time). Also, while it's possible (any perhaps even likely) that cases
that use this emergency mechanism will be able to end the VACUUM
immediately (because there is enough maintenance_work_mem() to make
the first call to two_pass_strategy() also the last call), that won't
always be how it works. Even deciding to stop index vacuuming (and
heap vacuuming) may not be enough to avert disaster if left too late
-- because we may still have to do a lot of table pruning. In cases
where there is not nearly enough maintenance_work_mem we will get
through the table a lot faster once we decide to skip indexes, but
there is some risk that even this will not be fast enough.

How about 1.8 billion XIDs? That's the maximum value of
autovacuum_freeze_max_age (2 billion) minus the default value (200
million). That is high enough that it seems almost impossible for this
emergency mechanism to hurt rather than help. At the same time it is
not so high that there isn't some remaining time to finish off work
which is truly required.

> > This seems like a good idea because we should try to avoid changing
> > back to index vacuuming having decided to skip it once.
>
> Once decided to skip index vacuuming due to too old relfrozenxid
> stuff, the decision never be changed within the same vacuum operation,
> right? Because the relfrozenxid is advanced at the end of vacuum.

I see no reason why it would be fundamentally incorrect to teach
two_pass_strategy() to make new and independent decisions about doing
index vacuuming on each call. I just don't think that that makes any
sense to do so, practically speaking. Why would we even *want* to
decide to not do index vacuuming, and then change our mind about it
again (resume index vacuuming again, for later heap blocks)? That
sounds a bit too much like me!

There is another reason to never go back to index vacuuming: we should
have an ereport() at the point that we decide to not do index
vacuuming (or not do additional index vacuuming) inside
two_pass_strategy(). This should deliver an unpleasant message to the
DBA. The message is (in my own informal language): An emergency
failsafe mechanism made VACUUM skip index vacuuming, just to avoid
likely XID wraparound failure. This is not supposed to happen.
Consider tuning autovacuum settings, especially if you see this
message regularly.

Obviously the reason to delay the decision is that we cannot easily
predict how long any given VACUUM will take (or just to reach
two_pass_strategy()). Nor can we really hope to understand how many
XIDs will be consumed in that time. So rather than trying to
understand all that, we can instead just wait until we have reliable
information. It is true that the risk of waiting until it's too late
to avert disaster exists (which is why 1.8 billion XIDs seems like a
good threshold to me), but there is only so much we can do about that.
We don't need it to be perfect, just much better.

In my experience, anti-wraparound VACUUM scenarios all have an
"accident chain", which is a concept from the world of aviation and
safety-critical systems:

https://en.wikipedia.org/wiki/Chain_of_events_(accident_analysis)

They usually involve some *combination* of Postgres problems,
application code problems, and DBA error. Not any one thing. I've seen
problems with application code that runs DDL at scheduled intervals,
which interacts badly with vacuum -- but only really on the rare
occasions when freezing is required! I've also seen a buggy
hand-written upsert function that artificially burned through XIDs at
a furious pace. So we really want this mechanism to not rely on the
system being in its typical state, if at all possible. When index
vacuuming is skipped due to concerns about XID wraparound, it should
really be a rare emergency that is a rare and unpleasant surprise to
the DBA. Nobody should rely on this mechanism consistently.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2021-03-18 18:49:00 Re: Key management with tests
Previous Message John Naylor 2021-03-18 18:23:09 Re: Perform COPY FROM encoding conversions in larger chunks