Re: New strategies for freezing, advancing relfrozenxid early

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: New strategies for freezing, advancing relfrozenxid early
Date: 2022-10-17 23:52:11
Message-ID: CAH2-WzkU42GzrsHhL2BiC1QMhaVGmVdb5HR0_qczz0Gu2aSn=A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 8, 2022 at 1:23 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> It might make sense to go further in the same direction by making
> "regular vs aggressive/antiwraparound" into a *strict* continuum. In
> other words, it might make sense to get rid of the two remaining cases
> where VACUUM conditions its behavior on whether this VACUUM operation
> is antiwraparound/aggressive or not.

I decided to go ahead with this in the attached revision, v5. This
revision totally gets rid of the general concept of discrete
aggressive/non-aggressive modes for each VACUUM operation (see
"v5-0004-Make-VACUUM-s-aggressive-behaviors-continuous.patch" and its
commit message). My new approach turned out to be simpler than the
previous half measures that I described as "unifying aggressive and
antiwraparound" (which itself first appeared in v3).

I now wish that I had all of these pieces in place for v1, since this
was the direction I was thinking of all along -- that might have made
life easier for reviewers like Jeff. What we have in v5 is what I had
in mind all along, which turns out to have only a little extra code
anyway. It might have been less confusing if I'd started this thread
with something like v5 -- the story I need to tell would have been
simpler that way. This is pretty much the end point I had in mind.

Note that we still retain what were previously "aggressive only"
behaviors. We only remove "aggressive" as a distinct mode of operation
that exclusively applies the aggressive behaviors. We're now selective
in how we apply each of the behaviors, based on the needs of the
table. We want to behave in a way that's proportionate to the problem
at hand, which is made easy by not tying anything to a discrete mode
of operation. It's a false dichotomy; why should we ever have only one
reason for running VACUUM, that's determined up front?

There are still antiwraparound autovacuums in v5, but that is really
just another way that autovacuum can launch an autovacuum worker (much
like it was before the introduction of the visibility map in 8.4) --
both conceptually, and in terms of how the code works in vacuumlazy.c.
In practice an antiwraparound autovacuum is guaranteed to advance
relfrozenxid in roughly the same way as on HEAD (otherwise what's the
point?), but that doesn't make the VACUUM operation itself special in
any way. Besides, antiwraparound autovacuums will naturally be rare,
because there are many more opportunities for a VACUUM to advance
relfrozenxid "early" now (only "early" relative to how it would work
on early Postgres versions). It's already clear that having
antiwraparound autovacuums and aggressive mode VACUUMs as two separate
concepts that are closely associated has some problems [1]. Formally
making antiwraparound autovacuums just another way to launch a VACUUM
via autovacuum seems quite useful to me.

For the most part users are expected to just take relfrozenxid
advancement for granted now. They should mostly be able to assume that
VACUUM will do whatever is required to keep it sufficiently current
over time. They can influence VACUUM's behavior, but that mostly works
at the level of the table (not the level of any individual VACUUM
operation). The freezing and skipping strategy stuff should do what is
necessary to keep up in the long run. We don't want to put too much
emphasis on relfrozenxid in the short run, because it isn't a reliable
proxy for how we've kept up with the physical work of freezing --
that's what really matters. It should be okay to "fall behind on table
age" in the short run, provided we don't fall behind on the physical
work of freezing. Those two things shouldn't be conflated.

We now use a separate pair of XID/MXID-based cutoffs to determine
whether or not we're willing to wait for a cleanup lock the hard way
(which can happen in any VACUUM, since of course there is no longer
any special VACUUM with special behaviors). The new pair of cutoffs
replace the use of FreezeLimit/MultiXactCutoff by lazy_scan_noprune
(those are now only used to decide on what to freeze inside
lazy_scan_prune). Same concept, but with a different, independent
timeline. This was necessary just to get an existing isolation test
(vacuum-no-cleanup-lock) to continue to work. But it just makes sense
to have a different timeline for a completely different behavior. And
it'll be more robust.

It's a really bad idea for VACUUM to try to wait indefinitely long for
a cleanup lock, since that's totally outside of its control. It
typically won't take very long at all for VACUUM to acquire a cleanup
lock, of course, but that is beside the point -- who really cares
what's true on average, for something like this? Sometimes it'll take
hours to acquire a cleanup lock, and there is no telling when that
might happen! And so pausing VACUUM/freezing of all other pages just
to freeze one page makes little sense. Waiting for a cleanup lock
before we really need to is just an overreaction, which risks making
the situation worse. The cure must not be worse than the disease.

This revision also resolves problems with freezing MultiXactIds too
lazily [2]. We now always trigger page level freezing in the event of
encountering a Multi. This is more consistent with the behavior on
HEAD, where we can easily process a Multi well before the cutoff
represented by vacuum_multixact_freeze_min_age (e.g., we notice that a
Multi has no members still running, making it safe to remove before
the cutoff is reached).

Also attaching a prebuilt copy of the "routine vacuuming" docs as of
v5. This is intended to be a convenience for reviewers, or anybody
with a general interest in the patch series. The docs certainly still
need work, but I feel that I'm making progress on that side of things
(especially in this latest revision). Making life easier for DBAs is
the single most important goal of this work, so the user docs are of
central importance. The current "Routine Vacuuming" docs have lots of
problems, but to some extent the problems are with the concepts
themselves.

[1] https://postgr.es/m/CAH2-Wz=DJAokY_GhKJchgpa8k9t_H_OVOvfPEn97jGNr9W=deg@mail.gmail.com
[2] https://postgr.es/m/CAH2-Wz=+B5f1izRDPYKw+sUgOr6=AkWXp2NikU5cub0ftbRQhA@mail.gmail.com
--
Peter Geoghegan

Attachment Content-Type Size
routine-vacuuming.html text/html 43.4 KB
v5-0001-Add-page-level-freezing-to-VACUUM.patch application/octet-stream 25.9 KB
v5-0005-Avoid-allocating-MultiXacts-during-VACUUM.patch application/octet-stream 8.2 KB
v5-0004-Make-VACUUM-s-aggressive-behaviors-continuous.patch application/octet-stream 96.9 KB
v5-0003-Add-eager-freezing-strategy-to-VACUUM.patch application/octet-stream 24.5 KB
v5-0006-Size-VACUUM-s-dead_items-space-using-VM-snapshot.patch application/octet-stream 4.7 KB
v5-0002-Teach-VACUUM-to-use-visibility-map-snapshot.patch application/octet-stream 30.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2022-10-17 23:58:47 Re: Allow WindowFuncs prosupport function to use more optimal WindowClause options
Previous Message Tom Lane 2022-10-17 23:18:14 Re: Allow WindowFuncs prosupport function to use more optimal WindowClause options