Re: Improving the "Routine Vacuuming" docs

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Improving the "Routine Vacuuming" docs
Date: 2022-04-13 16:34:22
Message-ID: CAH2-Wzk_fRWVcgN0KkwSKSJDKsz0po=s4e__dac=PhHNP+jaUg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Apr 13, 2022 at 8:40 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> > Something along the lines of the following seems more useful: "A tuple
> > whose xmin is frozen (and xmax is unset) is considered visible to
> > every possible MVCC snapshot. In other words, the transaction that
> > inserted the tuple is treated as if it ran and committed at some point
> > that is now *infinitely* far in the past."
>
> I agree with this idea.

Cool. Maybe I should write a doc patch just for this part, then.

What do you think of the idea of relating freezing to removing tuples
by VACUUM at this point? This would be a basis for explaining how
freezing and tuple removal are constrained by the same cutoff. A very
old snapshot can hold up cleanup, but it can also hold up freezing to
the same degree (it's just not as obvious because we are less eager
about freezing by default).

> > The alarming language isn't proportionate to the true danger
> > (something I complained about in a dedicated thread last year [1]).
>
> I mostly agree with this, but not entirely. The section needs some
> rephrasing, but xidStopLimit doesn't apply in single-user mode, and
> relfrozenxid and datfrozenxid values can and do get corrupted. So it's
> not a purely academic concern.

I accept the distinction that you want to make is valid. More on that below.

> > * XID space isn't really a precious resource -- it isn't even a
> > resource at all IMV.
>
> I disagree with this. Usable XID space is definitely a resource, and
> if you're in the situation where you care deeply about this section of
> the documentation, it's probably one in short supply. Being careful
> not to expend too many XIDs while fixing the problems that have cause
> you to be short of safe XIDs is *definitely* a real thing.

I may have gone too far with this metaphor. My point was mostly that
XID space has a highly unpredictable cost (paid in freezing).

Perhaps we can agree on some (or even all) of the following specific points:

* We shouldn't mention "4 billion XIDs" at all.

* We should say that the issue is an issue of distances between
unfrozen XIDs. The maximum distance that can ever be allowed to emerge
between any two unfrozen XIDs in a cluster is about 2 billion XIDs.

* We don't need to say anything about how XIDs are compared, normal vs
permanent XIDs, etc.

* The system takes drastic intervention to prevent this implementation
restriction from becoming a problem, starting with anti-wraparound
autovacuums. Then there's the failsafe. Finally, there's the
xidStopLimit mechanism, our last line of defense.

> I think it is wrong to conflate wraparound with xidStopLimit.
> xidStopLimit is the final defense against an actual wraparound, and
> like I say, an actual wraparound is quite possible if you put the
> system in single user mode and then do something like this:

I forget to emphasize one aspect of the problem that seems quite
important: the document itself seems to conflate the xidStopLimit
mechanism with true wraparound. At least I thought so. Last year's
thread on this subject ('What is "wraparound failure", really?') was
mostly about that confusion. I personally found that very confusing,
and I doubt that I'm the only one.

There is no good reason to use single user mode anymore (a related
problem with the docs is that we still haven't made that point). And
the pg_upgrade bug that led to invalid relfrozenxid values was
flagrantly just a bug (adding a WARNING for this recently, in commit
e83ebfe6). So while I accept that the distinction you're making here
is valid, maybe we can fix the single user mode doc bug too, removing
the need to discuss "true wraparound" as a general phenomenon. You
shouldn't ever see it in practice anymore. If you do then either
you've done something that "invalidated the warranty", or you've run
into a legitimate bug.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message chap 2022-04-13 16:48:10 Re: timezones BCE
Previous Message Dave Cramer 2022-04-13 16:33:01 timezones BCE