| From: | Naga Appani <nagnrik(at)gmail(dot)com> |
|---|---|
| To: | Tomas Vondra <tomas(at)vondra(dot)me> |
| Cc: | Xuneng Zhou <xunengzhou(at)gmail(dot)com>, torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org |
| Subject: | Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring |
| Date: | 2025-11-05 01:13:09 |
| Message-ID: | CA+QeY+Aja_=j1EuY87L06KaPO4EJqqkS4B+Vg9AsWnGM1d_VRA@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Thank you for the feedback, Tomas! I agree with the goal you outlined,
providing a
user-friendly “how much space is left” signal would make monitoring far more
actionable.
On Sat, Oct 18, 2025 at 6:18 AM Tomas Vondra <tomas(at)vondra(dot)me> wrote:
>
> Knowing num_mxids / num_members or members_size is nice, but how would
> I judge how far the system is from hitting some threshold or hard limit?
> Is there some maximum number of mxids/members that we could return? Or
> something like that?
Based on this, I experimented with calculating a num_remaining_members value to
estimate how close the system is to MultiXact member-space exhaustion. I tested
two approaches and validated their behavior through repeated exhaustion cycles.
The results are below.
At the same time, both you and Ashutosh pointed out that oldest_offset exposes
internal implementation details and is not particularly useful on its own, so I
removed oldest_offset in v11.
WHAT I TRIED in regards to space remaining
==========================================
Approach 1: (offsetStopLimit - nextOffset)
------------------------------------------
I exposed offsetStopLimit from GetMultiXactInfo() and computed:
remainingMembers = offsetStopLimit - nextOffset;
Behavior at exhaustion:
postgres=# SELECT num_mxids,num_members,remaining_members
FROM pg_get_multixact_stats();
num_mxids | num_members | remaining_members
-----------+-------------+-------------------
115409471 | 4294914940 | 1
(1 row)
After wraparound cleanup:
postgres=# SELECT num_mxids,num_members,remaining_members
FROM pg_get_multixact_stats();
num_mxids | num_members | remaining_members
-----------+-------------+-------------------
0 | 0 | 0
(1 row)
The value stayed at 0 until roughly ~100k new members were allocated. My reading
is that nextOffset wraps to a small value, while offsetStopLimit remains large
(derived from the oldestOffset at the moment of truncation). Without using the
backend’s wrap-aware comparison logic (MultiXactOffsetWouldWrap()), plain
subtraction crosses the wrap boundary and becomes misleading.
Approach 2: (MaxMultiXactOffset - members)
------------------------------------------
I also tested:
remainingMembers = MaxMultiXactOffset - members;
Across three exhaustion cycles:
1st attempt:
postgres=# SELECT num_mxids,num_members,remaining_members
FROM pg_get_multixact_stats();
num_mxids | num_members | remaining_members
-----------+-------------+-------------------
125098473 | 4294914940 | 52355
(1 row)
2nd attempt:
postgres=# SELECT num_mxids,num_members,remaining_members
FROM pg_get_multixact_stats();
num_mxids | num_members | remaining_members
-----------+-------------+-------------------
116285530 | 4294905729 | 61566
(1 row)
3rd attempt:
postgres=# SELECT num_mxids,num_members,remaining_members
FROM pg_get_multixact_stats();
num_mxids | num_members | remaining_members
-----------+-------------+-------------------
111973488 | 4294862592 | 104703
(1 row)
The system correctly rejected inserts in each cycle, but the computed
“remaining”
value increased between cycles. This seems to match the dynamic nature of
offsetStopLimit, which appears to be recomputed after truncation:
- based on the new oldestOffset
- aligned back to the start of its segment
- with one safety segment subtracted
Because the stop boundary shifts depending on segment boundaries, the plain
(Max − members) formula reflects alignment effects rather than actual remaining
capacity.
Understanding
============
Based on reading the relevant parts of multixact.c and observing the runtime
behavior, both approaches seem to run into limitations when trying to derive a
“remaining members” value outside the backend. I may be missing details, but the
behavior I observed suggests that a reliable computation might require
duplicating
several internal mechanisms, including:
- wrap-aware offset comparison
- SLRU page and segment alignment rules
- SetOffsetVacuumLimit’s segment recalculation
Without accounting for those, the derived numbers behaved inconsistently across
tests, sometimes staying at 0 until a large jump, and in other cases increasing
between exhaustion cycles. This seems broadly consistent with your concern that
simple arithmetic on these counters does not match how the backend determines
wraparound risk.
To be clear, this interpretation is based only on what I could infer from the
code and testing, and I may not be capturing the entire picture. But from what I
observed, a user-visible “remaining members” metric does not seem
straightforward
without exposing or replicating backend logic.
My thoughts
==========
Given all this, the cleanest approach appears to be not exposing a “remaining
members” counter directly.
PostgreSQL has historically avoided exposing remaining-capacity counters for
wraparound-limited resources such as:
- transaction IDs
- MultiXact IDs
- OIDs
Instead, PostgreSQL exposes current usage and relies on documented
thresholds for
monitoring. Following that established pattern avoids tying a SQL-visible
interface to backend internals that may evolve over time.
Self-monitoring based on documented limits
==========================================
Monitoring then follows the same pattern PostgreSQL already uses for XIDs and
other wraparound-limited values:
- track num_members growth over time
- warn when it exceeds roughly 2^31
- treat values approaching 2^32 as exhaustion-risk territory
- observe the growth rate to estimate when intervention may be needed
This keeps the interface simple, stable, and aligned with existing PostgreSQL
behavior.
Why oldest_offset was removed
=============================
Both you and Ashutosh pointed out that oldest_offset reflects internal SLRU
geometry and is not actionable without reproducing backend logic. Combined with
the behavior seen in the experiments above, it made sense not to expose this
field in the user-visible API. It is removed in v11.
Final shape of the function (v11)
=================================
The function now returns:
- num_mxids
- num_members
- members_size
- oldest_multixact
These fields are stable, directly interpretable, and do not depend on SLRU
internals or wrap-aware arithmetic.
On Thu, Oct 16, 2025 at 9:10 PM Xuneng Zhou <xunengzhou(at)gmail(dot)com> wrote:
> Here’s the updated v10 patch, now including access/htup_details.h in
> src/backend/utils/adt/multixactfuncs.c.
Thank you!
On Thu, Oct 16, 2025 at 7:28 PM torikoshia <torikoshia(at)oss(dot)nttdata(dot)com> wrote:
>
> Could you please update the patch to fix this?
Thank you for raising it and bringing it to attention!
Attached is the v11.
Best regards,
Naga
| Attachment | Content-Type | Size |
|---|---|---|
| v11-0001-Add-pg_get_multixact_stats-function-for-monitori.patch | application/octet-stream | 17.6 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tom Lane | 2025-11-05 01:19:35 | Re: [BUG] PostgreSQL crashes with ThreadSanitizer during early initialization |
| Previous Message | Joel Jacobson | 2025-11-05 01:10:45 | Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue |