Re: New strategies for freezing, advancing relfrozenxid early

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: New strategies for freezing, advancing relfrozenxid early
Date: 2022-09-08 20:23:52
Message-ID: CAH2-Wzn6dhY3M4MomiABKZDxVC4m=7ub3A8cy3_13j2pJTv1=w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Aug 31, 2022 at 12:03 AM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> Actually I'm also ignoring some subtleties with Multis that could make
> this not quite happen, but again, that's only a super obscure corner case.
> The idea that just setting vacuum_freeze_min_age = 0 and
> vacuum_multixact_freeze_min_age = 0 will be enough is definitely true
> in spirit. You don't need to touch vacuum_freeze_table_age (if you did
> then you'd get aggressive VACUUMs, and one goal here is to avoid
> those whenever possible -- especially aggressive antiwraparound
> autovacuums).

Attached is v3. There is a new patch included here -- v3-0004-*patch,
or "Unify aggressive VACUUM with antiwraparound VACUUM". No other
notable changes.

I decided to work on this now because it seems like it might give a
more complete picture of the high level direction that I'm pushing
towards. Perhaps this will make it easier to review the patch series
as a whole, even. The new patch unifies the concept of antiwraparound
VACUUM with the concept of aggressive VACUUM. Now there is only
antiwraparound and regular VACUUM (uh, barring VACUUM FULL). And now
antiwraparound VACUUMs are not limited to antiwraparound autovacuums
-- a manual VACUUM can also be antiwraparound (that's just the new
name for "aggressive").

We will typically only get antiwraparound vacuuming in a regular
VACUUM when the user goes out of their way to get that behavior.
VACUUM FREEZE is the best example. For the most part the
skipping/freezing strategy stuff has a good sense of what matters
already, and shouldn't need to be guided very often.

The patch relegates vacuum_freeze_table_age to a compatibility option,
making its default -1, meaning "just use autovacuum_freeze_max_age". I
always thought that having two table age based GUCs was confusing.
There was a period between 2006 and 2009 when we had
autovacuum_freeze_max_age, but did not yet have
vacuum_freeze_table_age. This change can almost be thought of as a
return to the simpler user interface that existed at that time. Of
course we must not resurrect the problems that vacuum_freeze_table_age
was intended to address (see originating commit 65878185) by mistake.
We need an improved version of the same basic concept, too.

The patch more or less replaces the table-age-aggressive-escalation
concept (previously implemented using vacuum_freeze_table_age) with
new logic that makes lazyvacuum.c's choice of skipping strategy *also*
depend upon table age -- it is now one more factor to be considered.
Both costs and benefits are weighed here. We now give just a little
weight to table age at a relatively early stage (XID-age-wise early),
and escalate from there. As the table's relfrozenxid gets older and
older, we give less and less weight to putting off the cost of
freezing. This general approach is possible because the false
dichotomy that is "aggressive vs non-aggressive" has mostly been
eliminated. This makes things less confusing for users and hackers.

The details of the skipping-strategy-choice algorithm are still
unsettled in v3 (no real change there). ISTM that the important thing
is still the high level concepts. Jeff was slightly puzzled by the
emphasis placed on the cost model/strategy stuff, at least at one
point. Hopefully my intent will be made clearer by the ideas featured
in the new patch. The skipping strategy decision making process isn't
particularly complicated, but it now looks more like an optimization
problem of some kind or other.

It might make sense to go further in the same direction by making
"regular vs aggressive/antiwraparound" into a *strict* continuum. In
other words, it might make sense to get rid of the two remaining cases
where VACUUM conditions its behavior on whether this VACUUM operation
is antiwraparound/aggressive or not. I'm referring to the cleanup lock
skipping behavior around lazy_scan_noprune(), as well as the
PROC_VACUUM_FOR_WRAPAROUND no-auto-cancellation behavior enforced in
autovacuum workers. We will still need to keep roughly the same two
behaviors, but the timelines can be totally different. We must be
reasonably sure that the cure won't be worse than the disease -- I'm
aware of quite a few individual cases that felt that way [1].
Aggressive interventions can make sense, but they need to be
proportionate to the problem that's right in front of us. "Kicking the
can down the road" is often the safest and most responsible approach
-- it all depends on the details.

[1] https://www.tritondatacenter.com/blog/manta-postmortem-7-27-2015
is the most high profile example, but I have personally been called in
to deal with similar problems in the past
--
Peter Geoghegan

Attachment Content-Type Size
v3-0001-Add-page-level-freezing-to-VACUUM.patch application/x-patch 27.3 KB
v3-0005-Avoid-allocating-MultiXacts-during-VACUUM.patch application/x-patch 7.3 KB
v3-0003-Add-eager-freezing-strategy-to-VACUUM.patch application/x-patch 24.5 KB
v3-0004-Unify-aggressive-VACUUM-with-antiwraparound-VACUU.patch application/x-patch 84.7 KB
v3-0002-Teach-VACUUM-to-use-visibility-map-snapshot.patch application/x-patch 30.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2022-09-08 20:32:14 Re: pgsql: Fix perltidy breaking perlcritic
Previous Message Matthias van de Meent 2022-09-08 19:56:09 Adding CommandID to heap xlog records