Re: Overhauling "Routine Vacuuming" docs, particularly its handling of freezing

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: samay sharma <smilingsamay(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Overhauling "Routine Vacuuming" docs, particularly its handling of freezing
Date: 2023-05-11 20:40:29
Message-ID: CAH2-Wznc2q_yhu0KNYwxfo1mdA4xHCm1aw4rLij50nLkhcRjJQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, May 11, 2023 at 1:04 PM Greg Stark <stark(at)mit(dot)edu> wrote:
> Fwiw while "wraparound" has pitfalls I think changing it for a new
> word isn't really helpful. Especially if it's a mostly meaningless
> word like "overload" or "exhaustion". It suddenly makes every existing
> doc hard to find and confusing to read.

Just to be clear, I am not proposing changing the name of
anti-wraparound autovacuum at all. What I'd like to do is use a term
like "XID exhaustion" to refer to the state that we internally refer
to as xidStopLimit. My motivation is simple: we've completely
terrified users by emphasizing wraparound, which is something that is
explicitly and prominently presented as a variety of data corruption.
The docs say this:

"But since transaction IDs have limited size (32 bits) a cluster that
runs for a long time (more than 4 billion transactions) would suffer
transaction ID wraparound: the XID counter wraps around to zero, and
all of a sudden transactions that were in the past appear to be in the
future — which means their output become invisible. In short,
catastrophic data loss."

> I say "exhaustion" or "overload" are meaningless because their meaning
> is entirely dependent on context. It's not like memory exhaustion or
> i/o overload where it's a finite resource and it's just the sheer
> amount in use that matters.

But transaction IDs are a finite resource, in the sense that you can
never have more than about 2.1 billion distinct unfrozen XIDs at any
one time. "Transaction ID exhaustion" is therefore a lot more
descriptive of the underlying problem. It's a lot better than
wraparound, which, as I've said, is inaccurate in two major ways:

1. Most cases involving xidStopLimit (or even single-user mode data
corruption) won't involve any kind of physical integer wraparound.

2. Most physical integer wraparound is harmless and perfectly routine.

But even this is fairly secondary to me. I don't actually think it's
that important that the name describe exactly what's going on here --
that's expecting rather a lot from a name. That's not really the goal.
The goal is to undo the damage of documentation that heavily implies
that data corruption is the eventual result of not doing enough
vacuuming, in its basic introductory remarks to freezing stuff.

Like Samay, my consistent experience (particularly back in my Heroku
days) has been that people imagine that data corruption would happen
when the system reached what we'd call xidStopLimit. Can you blame
them for thinking that? Almost any name for xidStopLimit that doesn't
have that historical baggage seems likely to be a vast improvement.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2023-05-11 20:49:13 Re: Overhauling "Routine Vacuuming" docs, particularly its handling of freezing
Previous Message Kirk Wolak 2023-05-11 20:37:52 Re: psql tests hangs