From: | Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> |
---|---|
To: | torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>, Naga Appani <nagnrik(at)gmail(dot)com> |
Cc: | Michael Paquier <michael(at)paquier(dot)xyz>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring |
Date: | 2025-08-22 11:44:59 |
Message-ID: | CAExHW5sUU1+Vq9oG6x=u85W7Obr-dPfAc7rQtaN-LJw5uufuMA@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Aug 22, 2025 at 7:37 AM torikoshia <torikoshia(at)oss(dot)nttdata(dot)com> wrote:
>
> | Now that pg_get_multixact_stats() exposes num_members, the HEAD branch
> | docs can describe the thresholds in terms of counts directly.
>
> Personally, I think it might be fine to keep the gigabyte-based
> description, and perhaps we could show both the number of members and
> gigabytes, since it'd be also helpful to have a sense of the scale.
>
Those who have grown their own utilities to monitor the on-disk usage
will not be able to use the count based thresholds and might take some
time for them to starting using pg_get_multixact_stats(). It makes
sense to mention both the count and the corresponding disk usage
threshold. Something like "Also, if the number of multixact members
exceeds approximately 2^31 entries (occupying roughly more than 10GB
in storage) ... ". Users can choose which threshold they want to use.
Adding disk storage threshold in parenthesis indicates that the count
is more accurate and more useful.
Here's detailed review of the patch
+ Returns statistics about current multixact usage:
+ <literal>num_mxids</literal> is the number of multixact IDs assigned,
+ <literal>num_members</literal> is the number of multixact member
entries created,
+ <literal>members_bytes</literal> is the storage occupied by
<literal>num_members</literal>
I thought mentioning bytes, a unit, in column name members_bytes would
not be appropriate in case we start reporting it in a different unit
like kB in future. But we already have
pg_stat_replication_slots::spill_bytes with similar naming. So may be
it's okay. But I would prefer members_size or members_storage or some
such units-free name.
+ in <literal>pg_multixact/members</literal> directory,
+ <literal>oldest_multixact</literal> is the oldest multixact ID still
in use, and
+ <literal>oldest_offset</literal> is the oldest member offset still in use.
I am not sure whether oldest_offset is worth exposing. It is an
implementation detail. Upthread, Michael suggested to expose oldest
offset from GetMultiXactInfo(), but I don't see him explicitly saying
that we should expose it through this function as well. Michael what
do you think?
+ These values can be used to monitor multixact consumption and anticipate
+ autovacuum behavior. See <xref linkend="vacuum-for-multixact-wraparound"/>
+ for further details on multixact wraparound.
I still think that this is not needed. There is no reason to restrict
how users want to use this function. We usually don't do that unless
there is a hazard associated with it.
+ <para>
+ This is a live snapshot of shared counters; the numbers can change
between calls,
+ even within the same transaction.
+ </para></entry>
I have not seen the phrase "live snapshot" being used in the
documentation before. How about "The function reports the statistics
at the time of invoking the function. They may vary between calls even
within the same transaction."?
+ linkend="guc-autovacuum-multixact-freeze-max-age"/>. Also, if the number
+ of members created exceeds approximately 2^31 entries, aggressive vacuum
a member means the transaction participating in a multixact. What you
intend to say is "if the number of multixacts member entries created
...", right?
+ <para>
+ The <function>pg_get_multixact_stats()</function> function, described in
unnecessary pair of commas.
+ This output shows a system with significant multixact activity:
about ~100 million
+ multixact IDs and ~773 million member entries have been created
since the oldest
+ surviving multixact (ID 39974368). By leveraging this information,
the function helps:
+ <orderedlist>
+ <listitem>
... snip ...
+ Detect potential performance impacts before they become critical.
+ For instance, high multixact usage from frequent row-level locking or
+ foreign key operations can lead to increased I/O and CPU overhead during
+ vacuum operations. Monitoring these stats helps tune autovacuum frequency
+ and transaction patterns.
+ </simpara>
+ </listitem>
+ </orderedlist>
I am unsure whether we should be mentioning use cases in such detail.
Users may find other ways to use those counts. I think the following
paragraph should be placed here.
+ These values can be used to monitor multixact consumption and anticipate
+ autovacuum behavior. See <xref linkend="vacuum-for-multixact-wraparound"/>
+ for further details on multixact wraparound.
But others may have different opinions.
Maybe you could further write in your example that an aggressive
autovacuum will be triggered in another 10 seconds (or so) if the
number of member entries continues to double every 5 seconds. Or some
practical "usage example" like that.
+ * Returns statistics about current MultiXact usage:
+ * - num_mxids: Number of MultiXact IDs in use
+ * - num_members: Total number of member entries
+ * - oldest_multixact: Oldest MultiXact ID still needed
+ * - oldest_offset: Oldest offset still in use
We don't need to mention each column here, it's evident from the
function body and also from the user facing documentation. Just the
first line is ok.
+ *
+ * Returns a row of NULLs if the MultiXact system is not yet initialized.
tuple or record instead of row.
In the earlier patch you were calling PG_RETURN_NULL(), which I
thought was better. It would get converted into a record of NULLs if
someone is to do SELECT * FROM pg_get_multixact_stats().
I don't think "the MultiXact system is not yet initialized" is the
right description of that condition. GetMultiXactInfo() prologue says
"
Returns false if unable to determine, the oldest offset being
unknown." MultiXactStatData has following comment for oldest offset.
/*
* Oldest multixact offset that is potentially referenced by a multixact
* referenced by a relation. We don't always know this value, so there's
* a flag here to indicate whether or not we currently do.
*/
And also
/* Have we completed multixact startup? */
bool finishedStartup;
I think we need to define this condition more accurately.
And include it in the documentation as well.
+ * Calculate approximate storage space:
+ * - Members are stored in groups of 4
+ * - Each group takes 20 bytes (5 bytes per member)
+ * Note: This ignores small page overhead (12 bytes per 8KB)
+ */
+ membersBytes = (int64) members * 5;
Do we have some constant macros or sizeof(some structure) defined for
5 and 4? That way this computation will be self maintaining and self
documenting.
+ nulls[0] = nulls[1] = nulls[2] = nulls[3] = nulls[4] = false;
memset(nulls, false, sizeof(nulls)); is better and used everywhere.
In fact, instead of initializing it all to true first and then setting
all to false here, we should memset here and set it to true in else
block.
+++ b/src/test/isolation/specs/multixact_stats.spec
I have not an seen an isolation test being used for testing a stats
function. But I find it useful. Let's see what others think.
@@ -0,0 +1,127 @@
+# High-signal invariants for pg_get_multixact_stats()
+# We create exactly one fresh MultiXact on a brand-new table. While
it is pinned
+# by two open transactions, we assert only invariants that background
VACUUM/FREEZE
+# cannot violate:
+# • members increased by ≥ 1 when the second locker arrived,
+# • num_mxids / num_members did not decrease vs earlier snapshots,
+# • oldest_* never decreases.
+# We make NO assertions after releasing locks (freezing/truncation
may shrink deltas).
+# NOTE: Snapshots snap0 and subsequent checks are taken inside an open driver
+# transaction to narrow the window for unrelated truncation between snapshots.
What's a driver transaction?
+#
+# Terminology (global counters):
+# num_mxids, num_members : “in-use” deltas derived from global horizons
+# oldest_multixact, offset : oldest horizons; they move forward, never backward
+#
+# All assertions execute while our multixact is pinned by open txns,
which protects
+# the truncation horizon (VACUUM can’t advance past our pinned multi).
Probably this comment is not needed. But from the sequence of steps
executed, the data is collected when multixact is pinned (what does
that mean?) but the assertions are executed at the end when all the
transactions are committed. Am I correct?
+step snap0 {
+ CREATE TEMP TABLE snap0 AS
+ SELECT num_mxids, num_members, oldest_multixact, oldest_offset
+ FROM pg_get_multixact_stats();
+}
You could use a single table with a primary key column to distinguish
snaps which can be used for joining the rows. Why use a temporary
table? Just setup and tear down the snap table as well?
+
+# Pretty, deterministic key/value output of boolean checks.
+# Keys:
... snip ...
+ (s1.num_mxids >= COALESCE(s0.num_mxids, 0)),
+ (s2.num_mxids >= COALESCE(s1.num_mxids, 0)),
+ (s1.num_members >= COALESCE(s0.num_members, 0)),
+ (s2.num_members >= COALESCE(s1.num_members, 0))
+ ]
This is getting too complex to follow. It produces pretty output but
the query is complex. Instead just let keys as the columns in the
query. Maybe you could print expanded output if that's possible in an
isolation test.
--
Best Wishes,
Ashutosh Bapat
From | Date | Subject | |
---|---|---|---|
Next Message | Andrei Lepikhov | 2025-08-22 11:53:26 | Re: Let plan_cache_mode to be a little less strict |
Previous Message | Amit Kapila | 2025-08-22 11:40:29 | Re: Conflict detection for update_deleted in logical replication |