Re: Overhauling "Routine Vacuuming" docs, particularly its handling of freezing

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Overhauling "Routine Vacuuming" docs, particularly its handling of freezing
Date: 2023-04-26 17:57:40
Message-ID: CAH2-Wz=VxwmbQowiyvf_5zCNUU_LZesB+TVW-BCe2dONcrNbOw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Apr 26, 2023 at 12:16 AM John Naylor
<john(dot)naylor(at)enterprisedb(dot)com> wrote:
> Now is a great time to revise this section, in my view. (I myself am about ready to get back to testing and writing for the task of removing that "obnoxious hint".)

Although I didn't mention the issue with single user mode in my
introductory email (the situation there is just appalling IMV), it
seems like I might not be able to ignore that problem while I'm
working on this patch. Declaring that as out of scope for this doc
patch series (on pragmatic grounds) feels awkward. I have to work
around something that is just wrong. For now, the doc patch just has
an "XXX" item about it. (Hopefully I'll think of a more natural way of
not fixing it.)

> > This initial version is still quite lacking in overall polish, but I
> > believe that it gets the general structure right. That's what I'd like
> > to get feedback on right now: can I get agreement with me about the
> > general nature of the problem? Does this high level direction seem
> > like the right one?
>
> I believe the high-level direction is sound, and some details have been discussed before.

I'm relieved that you think so. I was a bit worried that I'd get
bogged down, having already invested a lot of time in this.

Attached is v2. It has the same high level direction as v1, but is a
lot more polished. Still not committable, to be sure. But better than
v1.

I'm also attaching a prebuilt copy of routine-vacuuming.html, as with
v1 -- hopefully that's helpful.

> > 3. All of the stuff about modulo-2^32 arithmetic is moved to the
> > storage chapter, where we describe the heap tuple header format.
>
> It does seem to be an excessive level of detail for this chapter, so +1. Speaking of excessive detail, however...(skipping ahead)

My primary objection to talking about modulo-2^32 stuff first is not
that it's an excessive amount of detail (though it definitely is). My
objection is that it places emphasis on exactly the thing that *isn't*
supposed to matter, under the design of freezing -- greatly confusing
the reader (even sophisticated readers). Discussion of so-called
wraparound should start with logical concepts, such as xmin XIDs being
treated as "infinitely far in the past" once frozen. The physical data
structures do matter too, but even there the emphasis should be on
heap pages being "self-contained", in the sense that SQL queries won't
need to access pg_xact to read the rows from the pages going forward
(even on standbys).

Why do we call wraparound wraparound, anyway? The 32-bit XID space is
circular! The whole point of the design is that unsigned integer
wraparound is meaningless -- there isn't really a point in "the
circle" that you should think of as the start point or end point.
(We're probably stuck with the term "wraparound" for now, so I'm not
proposing that it be changed here, purely on pragmatic grounds.)

> + <note>
> + <para>
> + There is no fundamental difference between a
> + <command>VACUUM</command> run during anti-wraparound
> + autovacuuming and a <command>VACUUM</command> that happens to
> + use the aggressive strategy (whether run by autovacuum or
> + manually issued).
> + </para>
> + </note>
>
> I don't see the value of this, from the user's perspective, of mentioning this at all, much less for it to be called out as a Note. Imagine a user who has been burnt by non-cancellable vacuums. How would they interpret this statement?

I meant that it isn't special from the point of view of vacuumlazy.c.
I do see your point, though. I've taken that out in v2.

(I happen to believe that the antiwraparound autocancellation behavior
is very unhelpful as currently implemented, which biased my view of
this.)

> > 4. No more separate section for MultiXactID freezing -- that's
> > discussed as part of the discussion of page-level freezing.
> >
> > Page-level freezing takes place without regard to the trigger
> > condition for freezing. So the new approach to freezing has a fixed
> > idea of what it means to freeze a given page (what physical
> > modifications it entails). This means that having a separate sect3
> > subsection for MultiXactIds now makes no sense (if it ever did).
>
> I have no strong opinion on that.

Most of the time, when antiwraparound autovacuums are triggered by
autovacuum_multixact_freeze_max_age, in a way that is noticeable (say
a large table), VACUUM will in all likelihood end up processing
exactly 0 multis. What you'll get is pretty much an "early" aggressive
VACUUM, which isn't such a big deal (especially with page-level
freezing). You can already get an "early" aggressive VACUUM due to
hitting vacuum_freeze_table_age before autovacuum_freeze_max_age is
ever reached (in fact it's the common case, now that we have
insert-driven autovacuums).

So I'm trying to suggest that an aggressive VACUUM is the same
regardless of the trigger condition. To a lesser extent, I'm trying to
make the user aware that the mechanical difference between aggressive
and non-aggressive is fairly minor, even if the consequences of that
difference are quite noticeable. (Though maybe they're less noticeable
with the v16 work in place.)

> I've only taken a cursory look, but will look more closely as time permits.

I would really appreciate that. This is not easy work.

I suspect that the docs talk about wraparound using extremely alarming
language possible because at one point it really was necessary to
scare users into running VACUUM to avoid data loss. This was before
autovacuum, and before the invention of vxids, and even before the
invention of freezing. It was up to you as a user to VACUUM your
database using cron, and if you didn't then eventually data loss could
result.

Obviously these docs were updated many times over the years, but I
maintain that the basic structure from 20 years ago is still present
in a way that it really shouldn't be.

> (Side note: My personal preference for rough doc patches would be to leave out spurious whitespace changes.

I've tried to keep them out (or at least break the noisy whitespace
changes out into their own commit). I might have missed a few of them
in v1, which are fixed in v2.

Thanks
--
Peter Geoghegan

Attachment Content-Type Size
routine-vacuuming.html text/html 48.6 KB
v2-0006-Merge-basic-vacuuming-sect2-into-sect1-introducti.patch application/octet-stream 6.2 KB
v2-0008-Overhaul-freezing-and-wraparound-docs.patch application/octet-stream 53.6 KB
v2-0007-Make-maintenance.sgml-more-autovacuum-orientated.patch application/octet-stream 7.4 KB
v2-0001-Make-autovacuum-docs-into-a-sect1-of-its-own.patch application/octet-stream 17.3 KB
v2-0009-Overhaul-Recovering-Disk-Space-vacuuming-docs.patch application/octet-stream 10.7 KB
v2-0004-Reorder-routine-vacuuming-sections.patch application/octet-stream 16.4 KB
v2-0002-Restructure-autovacuum-daemon-section.patch application/octet-stream 5.3 KB
v2-0003-Normalize-maintenance.sgml-indentation.patch application/octet-stream 4.7 KB
v2-0005-Move-Interpreting-XID-stamps-from-tuple-headers.patch application/octet-stream 9.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Drouvot, Bertrand 2023-04-26 18:36:44 Re: Autogenerate some wait events code and documentation
Previous Message Drouvot, Bertrand 2023-04-26 16:51:46 Re: Autogenerate some wait events code and documentation