Re: Improving the "Routine Vacuuming" docs

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Improving the "Routine Vacuuming" docs
Date: 2022-04-13 21:18:32
Message-ID: CAH2-Wz=N3Di7iKBCWyyR4D_U61sNSZzagX=JL3c_BTf3fQzaoQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Apr 13, 2022 at 1:25 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Wed, Apr 13, 2022 at 12:34 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> > What do you think of the idea of relating freezing to removing tuples
> > by VACUUM at this point? This would be a basis for explaining how
> > freezing and tuple removal are constrained by the same cutoff.

> I think something like that could be useful, if we can find a way to
> word it sufficiently clearly.

What if the current "25.1.5. Preventing Transaction ID
Wraparound Failures" section was split into two parts? The first part
would cover freezing, the second part would cover
relfrozenxid/relminmxid advancement.

Freezing can sensibly be discussed before introducing relfrozenxid.
Freezing is a maintenance task that makes tuples self-contained
things, suitable for long term storage. Freezing makes tuples not rely
on transient transaction metadata (mainly clog), and so is an overhead
of storing data in Postgres long term.

That's how I think of it, at least. That definition seems natural to me.

> Those all sound pretty reasonable.

Great.

I have two more things that I see as problems. Would be good to get
your thoughts here, too. They are:

1. We shouldn't really be discussing VACUUM FULL here at all, except
to say that it's out of scope, and probably a bad idea.

You once wrote about the problem of how VACUUM FULL is perceived by
users (VACUUM FULL doesn't mean "VACUUM, but better"), expressing an
opinion of VACUUM FULL that I agree with fully. The docs definitely
contributed to that problem.

2. We don't go far enough in emphasizing the central role of autovacuum.

Technically the entire section assumes that its primary audience are
those users that have opted to not use autovacuum. This seems entirely
backwards to me.

We should make it clear that technically autovacuum isn't all that
different from running your own VACUUM commands, because that's an
important part of understanding autovacuum. But that's all. ISTM that
anybody that *entirely* opts out of using autovacuum is just doing it
wrong (besides, it's kind of impossible to do it anyway, what with
anti-wraparound autovacuum being impossible to disable).

There is definitely a role for using tools like cron to schedule
off-hours VACUUM operations, and that's still worth pointing out
prominently. But that should be a totally supplementary thing, used
when the DBA understands that running VACUUM off-hours is less
disruptive.

> There's a little bit of doubt in my
> mind about the third one; I think it could possibly be useful to
> explain that the XID space is circular and 0-2 are special, but maybe
> not.

I understand the concern. I'm not saying that this kind of information
doesn't have any business being in the docs. Just that it has no
business being in this particular chapter of the docs. In fact, it
doesn't even belong in "III. Server Administration". If it belongs
anywhere, it should be in some chapter from "VII. Internals".

Discussing it here just seems inappropriate (and would be even if it
wasn't how we introduce discussion of wraparound). It's really only
tangentially related to VACUUM anyway. It seems like it should be
covered when discussing the heapam on-disk representation.

> I think it is probably important to discuss this, but along the lines
> of: it is possible to bypass all of these safeguards and cause a true
> wraparound by running in single-user mode. Don't do that. There's no
> wraparound situation that can't be addressed just fine in multi-user
> mode, and here's how to do that. In previous releases, we used to
> sometimes recommend single user mode, but that's no longer necessary
> and not a good idea, so steer clear.

Yeah, that should probably happen somewhere.

On the other hand...why do we even need to tolerate wraparound in
single-user mode? I do see some value in reserving extra XIDs that can
be used in single-user mode (apparently single-user mode can be used
in scenarios where you have buggy event triggers, things like that).
But that in itself does not justify allowing single-user mode to
exceed xidWrapLimit.

Why shouldn't single-user mode also refuse to allocate new XIDs when
we reach xidWrapLimit (as opposed to when we reach xidStopLimit)?

Maybe there is a good reason to believe that allowing single-user mode
to corrupt the database is the lesser evil, but if there is then I'd
like to know the reason.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2022-04-13 21:35:21 allow specifying action when standby encounters incompatible parameter settings
Previous Message Michael Paquier 2022-04-13 21:18:29 Re: Fixes for compression options of pg_receivewal and refactoring of backup_compression.{c,h}