Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation
Date: 2022-10-20 18:52:03
Message-ID: CAH2-Wz=AfJKAwRq-PV-3WTU3mg-RRfnbMEdga0CBmwAXRmebcQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Oct 20, 2022 at 11:09 AM Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> The terminology is getting slightly confusing here: by
> "antiwraparound", you mean that it's not skipping unfrozen pages, and
> therefore is able to advance relfrozenxid. Whereas the
> PROC_VACUUM_FOR_WRAPAROUND is the same thing, except done with greater
> urgency because wraparound is imminent. Right?

Not really.

I started this thread to discuss a behavior in autovacuum.c and proc.c
(the autocancellation behavior), which is, strictly speaking, not
related to the current vacuumlazy.c behavior we call aggressive mode
VACUUM. Various hackers have in the past described antiwraparound
autovacuum as "implying aggressive", which makes sense; what's the
point in doing an antiwraparound autovacuum that can almost never
advance relfrozenxid?

It is nevertheless true that antiwraparound autovacuum is an
independent behavior to aggressive VACUUM. The former is an autovacuum
thing, and the latter is a VACUUM thing. That's just how it works,
mechanically.

If this division seems artificial or pedantic to you, then consider
the fact that you can quite easily get a non-aggressive antiwraparound
autovacuum by using the storage option called
autovacuum_freeze_max_age (instead of the GUC):

https://postgr.es/m/CAH2-Wz=DJAokY_GhKJchgpa8k9t_H_OVOvfPEn97jGNr9W=deg@mail.gmail.com

This is even a case where we'll output a distinct description in the
server log when autovacuum logging is enabled and gets triggered. So
while there may be no point in an antiwraparound autovacuum that is
non-aggressive, that doesn't stop them from happening. Regardless of
whether or not that's an intended behavior, that's just how the
mechanism has been constructed.

> > There is no inherent reason why we have to do both
> > things at exactly the same XID-age-wise time. But there is reason to
> > think that doing so could make matters worse rather than better [1].
>
> Can you explain?

Why should the special autocancellation behavior for antiwraparound
autovacuums kick in at exactly the same point that we first launch an
antiwraparound autovacuum? Maybe that aggressive intervention will be
needed, in the end, but why start there?

With my patch series, antiwraparound autovacuums still occur, but
they're confined to things like static tables -- things that are
pretty much edge cases. They still need to behave sensibly (i.e.
reliably advance relfrozenxid based on some principled approach), but
now they're more like "an autovacuum that happens because no other
condition triggered an autovacuum". To some degree this is already the
case, but I'd like to be more deliberate about it.

Leaving my patch series aside, I still don't think that it makes sense
to make it impossible to auto-cancel antiwraparound autovacuums,
across the board, regardless of the underlying table age. We still
need something like that, but why not give a still-cancellable
autovacuum worker a chance to resolve the problem? Why take a risk of
causing much bigger problems (e.g., blocking automated DDL that blocks
simple SELECT queries) before the point that that starts to look like
the lesser risk (compared to hitting xidStopLimit)?

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2022-10-20 19:12:12 Re: cross-platform pg_basebackup
Previous Message Robert Haas 2022-10-20 18:51:21 Re: Avoid memory leaks during base backups