Re: effective_multixact_freeze_max_age issue

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, "Bossart, Nathan" <bossartn(at)amazon(dot)com>, Jeremy Schneider <schneider(at)ardentperf(dot)com>
Subject: Re: effective_multixact_freeze_max_age issue
Date: 2022-08-29 17:25:50
Message-ID: CAH2-WzkOv5CEeyOO=c91XnT5WBR_0gii0Wn5UbZhJ=4TTykDYg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Aug 28, 2022 at 4:14 PM Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote:
> The idea seems sound to me, and IMO your patch simplifies things nicely,
> which might be reason enough to proceed with it.

It is primarily a case of making things simpler. Why would it ever
make sense to interpret age differently in the presence of a long
running transaction, though only for the FreezeLimit/MultiXactCutoff
cutoff calculation? And not for the closely related
freeze_table_age/multixact_freeze_table_age calculation? It's hard to
imagine that that was ever a deliberate choice.

vacuum_set_xid_limits() didn't contain the logic for determining if
its caller's VACUUM should be an aggressive VACUUM until quite
recently. Postgres 15 commit efa4a9462a put the logic for determining
aggressiveness right next to the logic for determining FreezeLimit,
which made the inconsistency much more noticeable. It is easy to
believe that this was really just an oversight, all along.

> However, I'm struggling
> to understand when this change would help much in practice. IIUC it will
> cause vacuums to freeze a bit more, but outside of extreme cases (maybe
> when vacuum_freeze_min_age is set very high and there are long-running
> transactions), ISTM it might not have tremendously much impact. Is the
> intent to create some sort of long-term behavior change for autovacuum, or
> is this mostly aimed towards consistency among the cutoff calculations?

I agree that this will have only a negligible impact on the majority
(perhaps even the vast majority) of applications. The primary
justification for this patch (simplification) seems sufficient, all
things considered. Even still, it's possible that it will help in
extreme cases. Cases with pathological performance issues,
particularly those involving MultiXacts.

We already set FreezeLimit to the most aggressive possible value of
OldestXmin when OldestXmin has itself already crossed a quasi
arbitrary XID-age threshold of autovacuum_freeze_max_age XIDs (i.e.
when OldestXmin < safeLimit), with analogous rules for
MultiXactCutoff/OldestMxact. Consequently, the way that we set the
cutoffs for freezing in pathological cases where (say) there is a
leaked replication slot will see a sharp discontinuity in how
FreezeLimit is set, within and across tables. And for what?

Initially, these pathological cases will end up using exactly the same
FreezeLimit for every VACUUM against every table (assuming that we're
using a system-wide min_freeze_age setting) -- every VACUUM operation
will use a FreezeLimit of `OldestXmin - vacuum_freeze_min_age`. At a
certain point that'll suddenly flip -- now every VACUUM operation will
use a FreezeLimit of `OldestXmin`. OTOH with the patch they'd all have
a FreezeLimit that is tied to the time that each VACUUM started --
which is exactly the FreezeLimit behavior that we'd get if there was
no leaked replication slot (at least until OldestXmin finally attains
an age of vacuum_freeze_min_age, when it finally becomes unavoidable,
even with the patch).

There is something to be said for preserving the "natural diversity"
of the relfrozenxid values among tables, too. The FreezeLimit we use
is (at least for now) almost always going to be very close to (if not
exactly) the same value as the NewFrozenXid value that we set
relfrozenxid to at the end of VACUUM (at least in larger tables).
Without the patch, a once-off problem with a leaked replication slot
can accidentally result in lasting problems where all of the largest
tables get their antiwraparound autovacuums at exactly the same time.
The current behavior increases the risk that we'll accidentally
"synchronize" the relfrozenxid values for large tables that had an
antiwraparound vacuum during the time when OldestXmin was held back.

Needlessly using the same FreezeLimit across each VACUUM operation
risks disrupting the natural ebb and flow of things. It's hard to say
how much of a problem that is in the real word. But why take any
chances?

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2022-08-29 18:04:13 Re: logical decoding and replication of sequences
Previous Message Nathan Bossart 2022-08-29 17:17:12 Re: use ARM intrinsics in pg_lfind32() where available