Re: do only critical work during single-user vacuum?

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: do only critical work during single-user vacuum?
Date: 2022-02-15 17:28:47
Message-ID: CAH2-WzkpZcat1YmEVAKpcMBaWRXUd2vGVKHtrBjzBGDL=j6mbA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Feb 14, 2022 at 10:04 PM John Naylor
<john(dot)naylor(at)enterprisedb(dot)com> wrote:
> Well, the point of inventing this new vacuum mode was because I
> thought that upon reaching xidStopLimit, we couldn't issue commands,
> period, under the postmaster. If it was easier to get a test instance
> to xidStopLimit, I certainly would have discovered this sooner.

I did notice from my own testing of the failsafe (by artificially
inducing wraparound failure using an XID burning C function) that
autovacuum seemed to totally correct the problem, even when the system
had already crossed xidStopLimit - it came back on its own. I wasn't
completely sure of how robust this effect was, though.

> When
> Andres wondered about getting away from single user mode, I assumed
> that would involve getting into areas too deep to tackle for v15. As
> Robert pointed out, lazy_truncate_heap is the only thing that can't
> happen for vacuum at this point, and fully explains why in versions <
> 14 our client's attempts to vacuum resulted in error. Since the
> failsafe mode turns off truncation, vacuum should now *just work* near
> wraparound. If there is any doubt, we can tighten the check for
> entering failsafe.

Obviously having to enter single user mode is horrid. If we can
reasonably update the advice to something more reasonable now, then
that would help users that find themselves in this situation a great
deal.

> Now, it's certainly possible that autovacuum is either not working at
> all because of something broken, or is not working on the oldest
> tables at the moment, so one thing we could do is to make VACUUM [with
> no tables listed] get the tables from pg_class in reverse order of
> max(xid age, mxid age). That way, the horizon will eventually pull
> back over time and the admin can optionally cancel the vacuum at some
> point. Since the order is harmless when it's not needed, we can do
> that unconditionally.

My ongoing work on freezing/relfrozenxid tends to make the age of
relfrozenxid much more indicative of the amount of work that VACUUM
would have to do when run -- not limited to freezing. You could
probably do this anyway, but it's nice that that'll be true.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nitin Jadhav 2022-02-15 17:53:57 Re: Refactor CheckpointWriteDelay()
Previous Message Julien Rouhaud 2022-02-15 17:19:56 Re: Mark all GUC variable as PGDLLIMPORT