Re: What is "wraparound failure", really?

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Subject: Re: What is "wraparound failure", really?
Date: 2021-06-28 06:39:48
Message-ID: CAH2-Wz=UJYPOJJQZJzpdnKiCiy8T7t8MLaud7CgL4-GuOC9gaA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Jun 27, 2021 at 4:23 PM Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
> AIUI, actual wraparound (i.e. an xid crossing the event horizon so it
> appears to be in the future) is no longer possible. But it once was a
> very real danger. Maybe the docs haven't quite caught up.

This was added a few years after freezing was first invented, which
was arguably the last time that the design fundamentally changed. I
think we all agree that it's fundamentally not okay to give wrong
answers to queries -- it doesn't even need to be stated in the docs
IMV. So why does this section of the docs spend so much time talking
about something that fundamentally cannot happen? Why not have it
focus on the bad outcome that there is a real risk of instead? Namely
the risk of the system refusing to allow new XIDs (as a means of
avoiding the wrong answers when all else fails).

It's hard to talk about the new failsafe in this section of the docs
now, since it's unclear whether it exists to advise the user on ways
of avoiding the "can't allocate XIDs" failure mode. It could be
interpreted that way, or it could just be explaining and/or justifying
the existence of the failure mode. That seems like a real problem.

> In practical terms, there is an awful lot of head room between the
> default for autovacuum_freeze_max_age and any danger of major
> anti-wraparound measures. Say you increase it to 1bn from the default
> 200m. That still leaves you ~1bn transactions of headroom.

I agree that in practice that's often fine. But my point is that there
is another very good reason to not increase autovacuum_freeze_max_age,
contrary to what the docs say (actually there is a far better reason
than truncating clog). Namely, increasing it will generally increase
the risk of VACUUM not finishing in time. If that happens the user
gets the "can't allocate XIDs" failure mode (which is what I have
called wraparound failure up until now), which is one of the worst
things that can happen. This makes the inability to truncate clog look
like a totally trivial issue in comparison.

Reasonable people can disagree about when and how increasing
autovacuum_freeze_max_age becomes truly reckless. However, I don't
think that anybody would be willing to argue that setting it to the
maximum of 2 billion could ever make sense in production, to go with
the obvious extreme case. The benefits that you get from such a high
setting over and above what you get with a moderately high setting
(perhaps 1 - 1.5 billion) are really quite small, while the risk
shoots up fast past a certain point.

Regardless of what the nuances of increasing autovacuum_freeze_max_age
are, stating that the sole disadvantage is that you cannot truncate
clog and other SLRUs is clearly wrong.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2021-06-28 07:36:42 Re: Different compression methods for FPI
Previous Message Amit Kapila 2021-06-28 06:14:27 Re: Deadlock risk while inserting directly into partition?