Re: do only critical work during single-user vacuum?

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: do only critical work during single-user vacuum?
Date: 2022-03-15 21:48:24
Message-ID: CAH2-Wzk0ZrsYx7FxQZ2dGm4XLDNwp2jpNDXwjpASiJNs4HBhDQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Feb 16, 2022 at 12:43 AM John Naylor
<john(dot)naylor(at)enterprisedb(dot)com> wrote:
> I'll put some effort in finding any way that it might not be robust.
> After that, changing the message and docs is trivial.

It would be great to be able to totally drop the idea of using
single-user mode before Postgres 15 feature freeze. How's that going?

I suggest that we apply the following patch as part of that work. It
adds one last final failsafe check at the point that VACUUM makes a
final decision on rel truncation.

It seems unlikely that the patch will ever make the crucial difference
in a wraparound scenario -- in practice it's very likely that we'd
have triggered the wraparound at that point if we run into trouble
with the target rel's relfrozenxid age. And even if it does get to
that point, it would still be possible for the autovacuum launcher to
launch another autovacuum -- this time around we will avoid rel
truncation, restoring the system to normal operation (i.e. no more
xidStopLimit state).

On the other hand it's possible that lazy_cleanup_all_indexes() will
take a very long time to run, and it runs after the current final
failsafe check. An index AM's amvacuumcleanup() routine can take a
long time to run sometimes, especially with GIN indexes. And so it's
just about possible that we won't have triggered the failsafe by the
time lazy_cleanup_all_indexes() is called, which then spends a long
time doing index cleanup -- long enough for the system to reach
xidStopLimit due to the target rel's relfrozenxid age crossing the
crucial xidStopLimit crossover point.

This patch makes this problem scenario virtually impossible. Right now
I'm only prepared to say it's very unlikely. I don't see a reason to
take any chances, though.

--
Peter Geoghegan

Attachment Content-Type Size
v1-0001-Perform-final-failsafe-check-before-truncation.patch application/octet-stream 922 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Stark 2022-03-15 21:54:08 Re: Skip partition tuple routing with constant partition key
Previous Message Nathan Bossart 2022-03-15 21:48:13 Re: Optimize external TOAST storage