Re: Improving the "Routine Vacuuming" docs

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Improving the "Routine Vacuuming" docs
Date: 2022-04-13 00:22:04
Message-ID: CAH2-Wz=zoUyxh1fJPoYG5-vreR9wks6XdKh7y2H-o=HzV==N_Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Apr 12, 2022 at 4:24 PM David G. Johnston
<david(dot)g(dot)johnston(at)gmail(dot)com> wrote:
> I've attached some off-the-cuff thoughts on reworking the first three paragraphs and the note.
>
> It's hopefully useful for providing perspective if nothing else.

More perspective is definitely helpful.

> I'm assuming and caring only about visible rows when I'm reading this section. Maybe we need to make that explicit - only xmin matters (and the invisible frozen flag)?

The statement "only xmin matters" is true in spirit. If xmax needed to
be frozen then we'd actually remove the whole tuple instead (unless it
was a MultiXact). Alternatively, if it looked like xmax needed to be
frozen, but the XID turned out to have been from an aborted xact, then
we'd clear the XID from xmax instead.

Freezing tends to lag the removal of dead tuples, but that's just an
optimization. If you set vacuum_freeze_min_age to 0 then freezing and
dead tuple removal happen in tandem (actually, we can remove tuples
inserted by an XID after VACUUM's OldestXmin/removal cutoff when the
inserting xact aborts, but VACUUM makes no real promises about XIDs >=
OldestXmin anyway).

>> It might also be useful to describe freezing all of a live tuple's
>> XIDs as roughly the opposite process as completely physically removing
>> a dead tuple. It follows that we don't necessarily need to freeze
>> anything to advance relfrozenxid (especially not on Postgres 15).
>
>
> I failed to pickup on how this and "mod-2^32" math interplay, and I'm not sure I care when reading this. It made more sense to consider "shortest path" along the "circle".

It would probably be possible to teach the system to deal with
coexisting XIDs that are close to a full ~4 billion XIDs apart. We'd
have to give up on the mod-2^32 comparison stuff in functions like
TransactionIdPrecedes(), and carefully keep track of things per-table,
and carry that context around a lot more. I certainly don't think that
that's a good idea (if 2 billion XIDs wasn't enough, why should 4
billion XIDs be?), but it does seem feasible.

My point is this: there is nothing particularly intuitive or natural
about the current ~2 billion XID limit, even if you already know that
XIDs are generally represented on disk as 32-bit unsigned integers.
And so the fact is that we are already asking users to take it on
faith that there are truly good reasons why the system cannot tolerate
any scenario in which two unfrozen XIDs are more than about ~2 billion
XIDs apart. Why not just admit that, and then deem the XID comparison
rules out of scope for this particular chapter of the docs?

*Maybe* it's still useful to discuss why things work that way in code
like TransactionIdPrecedes(), but that's a totally different
discussion -- it doesn't seem particularly relevant to the design of
VACUUM, no matter the audience. Most DBAs will just accept that the
"XID distance" limit/invariant is about ~2 billion XIDs for esoteric
implementation reasons, with some vague idea of why it must be so
(e.g., "32-bit integers don't have enough space"). They will be no
worse off for it.

(Bear in mind that mod-2^32 comparison stuff was only added when
freezing/wraparound was first implemented back in 2001, by commit
bc7d37a525.)

>> Currently, "25.1.5. Preventing Transaction ID Wraparound Failures"
>> says this, right up-front:
>>
>> "But since transaction IDs have limited size (32 bits) a cluster that
>> runs for a long time (more than 4 billion transactions) would suffer
>> transaction ID wraparound"
>
>
> I both agree and disagree - where I settled (as of now) is reflected in the patch.

I just don't think that you need to make it any more complicated than
this: physical XID values are only meaningful when compared to other
XIDs from the same cluster. The system needs to make sure that no two
XIDs can ever be more than about 2 billion XIDs apart, and here's how
you as a DBA can help the system to make sure of that.

Discussion of the past becoming the future just isn't helpful, because
that simply cannot ever happen on any version of Postgres from the
last decade. Freezing is more or less an overhead of storing data in
Postgres long term (even medium term) -- that is the simple reality.
We should say so.

> I am wondering, for the more technical details, is there an existing place to send xrefs, do you plan to create one, or is it likely unnecessary?

I might end up doing that, but just want to get a general sense of how
other hackers feel about it for now.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2022-04-13 00:26:26 Re: failures in t/031_recovery_conflict.pl on CI
Previous Message Zheng Li 2022-04-13 00:19:23 Re: Support logical replication of DDLs