Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Jeff Davis <pgsql(at)j-davis(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation
Date: 2023-01-12 21:08:06
Message-ID: CA+Tgmobe03hf1R4Sjb85Mx-MCs_3n3j51x4GFyThLujY6o=xrQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 12, 2023 at 2:22 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> All that I really want to do here is give an autovacuum that *can* be
> auto cancelled *some* non-zero chance to succeed with these kinds of
> tables. TRUNCATE completes immediately, so the AEL is no big deal.
> Except when it's blocked behind an antiwraparound autovacuum. That
> kind of interaction is occasionally just disastrous. Even just the
> tiniest bit of wiggle room could avoid it in most cases, possibly even
> almost all cases.

I doubt it. Wiggle room that's based on the XID threshold being
different for one behavior vs. another can easily fail to produce any
benefit, because there's no guarantee that the autovacuum launcher
will ever try to launch a worker against that table while the XID is
in the range where you'd get one behavior and not the other. I've
long thought that the fact that vacuum_freeze_table_age is documented
as capped at 0.95 * autovacuum_freeze_max_age is silly for just this
reason. The interval that you're proposing is much wider so the
chances of getting a benefit are greater, but supposing that it's
going to solve it in most cases seems like an exercise in unwarranted
optimism.

In fact, I would guess that in fact it will very rarely solve the
problem. Normally, the XID age of a table never reaches
autovacuum_freeze_max_age in the first place. If it does, there's some
reason. Maybe there's a really old open transaction or an abandon
replication slot or an unresolved 2PC transaction. Maybe the
autovacuum system is overloaded and no table is getting visited
regularly because the system just can't keep up. Or maybe there are
regular AELs being taken on the table at issue. If there's only an AEL
taken against a table once in blue moon, some autovacuum attempt ought
to succeed before we reach autovacuum_freeze_max_age. Flipping that
around, if we reach autovacuum_freeze_max_age without advancing
relfrozenxid, and an AEL shows up behind us in the lock queue, it's
really likely that the reason *why* we've reached
autovacuum_freeze_max_age is that this same thing has happened to
every previous autovacuum attempt and they all cancelled themselves.
If we cancel ourselves too, we're just postponing resolution of the
problem to some future point when we decide to stop cancelling
ourselves. That's not a win.

> I think that users will really appreciate having only one kind of
> VACUUM/autovacuum (since the other patch gets rid of discrete
> aggressive mode VACUUMs). I want "table age autovacuuming" (as I
> propose to call it) come to be seen as not any different to any other
> autovacuum, such as an "insert tuples" autovacuum or a "dead tuples"
> autovacuum. The difference is only in how autovacuum.c triggers the
> VACUUM, not in any runtime behavior. That's an important goal here.

I don't agree with that goal. I think that having different kinds of
autovacuums with different, identifiable names and corresponding,
easily-identifiable behaviors is really important for troubleshooting.
Trying to remove those distinctions and make everything look the same
will not keep autovacuum from getting itself into trouble. It will
just make it harder to understand what's happening when it does.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2023-01-12 22:07:55 Re: Using WaitEventSet in the postmaster
Previous Message Christophe Pettus 2023-01-12 21:07:04 Re: errdetail vs errdetail_log?