Re: mxid_score can become Infinity in pg_stat_autovacuum_scores

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Sami Imseih <samimseih(at)gmail(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mxid_score can become Infinity in pg_stat_autovacuum_scores
Date: 2026-06-15 18:20:50
Message-ID: CALj2ACW6az3gpwZVzJwY2DHnjjEHb7LVhoNapk9AKD8FR6oRrA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Fri, Jun 12, 2026 at 11:20 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> While testing the autovacuum score, I noticed that scores->mxid could
> be infinity by the following calculation in
> relation_needs_vacanalyze():
>
> scores->mxid = (double) mxid_age / multixact_freeze_max_age;

Nice catch!

On Fri, Jun 12, 2026 at 1:38 PM Sami Imseih <samimseih(at)gmail(dot)com> wrote:
>
> If multixact_freeze_max_age is 0 (we have more than
> MULTIXACT_MEMBER_HIGH_THRESHOLD members, 4 billion)
> we then prioritize based on mxid_age, which will be high at that
> point for most cases and put that table high on the priority list.

Commit bd8d9c9bdf eliminated MultiXactOffset wraparound and the 2^32
limit on the total number of multixact members (i.e., the number of
txn-ids that are part of all multixacts at any given moment). However,
to limit disk space usage, it retained the aggressive multixact
freezing logic (with a note to make it configurable in future). This
means that when the total multixact members exceed 4 billion, we can
hit a condition where the computed fraction is >= 1.0 and the returned
freeze threshold is 0, telling the caller that freezing is urgent on
this table.

When this happens, we want the table to be vacuumed regardless of
other scores. However, with just setting scores->mxid = mxid_age (as
in the attached patch), unless I'm missing something, there seems to
be a risk that the table won't get to the top of the priority list
because scores->max gets recalculated even after mxid score is
accounted with max of (xid, mxid). Could you help me understand how
this case is handled?

> I do think we need to mention in the docs also about this caveat
> in scoring, so users of pg_stat_autovacuum_scores are not surprised.
> As member space usage grows between 2 billion and 4 billion, the
> score ramps up gradually, but once members reach 4 billion the effective freeze
> max age drops to 0 and the score jumps to mxid_age itself,
> which could be in the hundreds of millions.

I didn't find commit bd8d9c9bdf adding any documentation. Maybe it's
worth adding some notes on what it means for the customers having
multixact-heavy workloads - especially it eliminates anti-wraparound
freezing because of running out of members space.

> See attached.

Thanks for the patch. Some comments:

1/
+ <xref linkend="guc-autovacuum-multixact-freeze-max-age"/>. However,
+ when multixact member space usage is high (see
+ <xref linkend="vacuum-for-multixact-wraparound"/>), the effective
+ freeze max age is reduced below
+ <xref linkend="guc-autovacuum-multixact-freeze-max-age"/> to help
+ reclaim multixact member disk space, which can result in much higher
+ scores than normal. Furthermore, this component increases greatly
+ once the age surpasses
+ <xref linkend="guc-vacuum-multixact-failsafe-age"/>. The
+ final value for this component can be adjusted via

Isn't the "effective freeze max age" code-level terminology? IMHO,
adding a separate section for the commit bd8d9c9bdf makes it more
useful.

2/
/*
* To calculate the (M)XID age portion of the score, divide the age by its
- * respective *_freeze_max_age parameter.
+ * respective *_freeze_max_age parameter. MultiXactMemberFreezeThreshold()
+ * can return 0, in which case we effectively use mxid_age as the score.
*/
xid_age = TransactionIdIsNormal(relfrozenxid) ? recentXid - relfrozenxid : 0;
mxid_age = MultiXactIdIsValid(relminmxid) ? recentMulti - relminmxid : 0;

For better readability, can we enhance this comment by saying exactly
when the freeze threshold gets returned as 0 telling the caller that
freezing is urgent on this table?

3/ I checked around to see if we have tests for the case where we hit
this case where fraction is >= 1.0 i.e. multixact members are >
4billion and the closest I found is this 002_multixact_wraparound.pl,
but I don't think it covers this case. Its worth testing this case and
the fix locally. FWIW, this code doesn't have coverage -
https://coverage.postgresql.org/src/backend/access/transam/multixact.c.gcov.html.

--
Bharath Rupireddy
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Álvaro Herrera 2026-06-15 18:22:45 Re: Coverage with make coverage-html is broken on latest Debian using lcov v2
Previous Message Robert Haas 2026-06-15 17:50:46 use of SPI by postgresImportForeignStatistics