Re: Enhancing Memory Context Statistics Reporting

From: Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>
To: Rahila Syed <rahilasyed90(at)gmail(dot)com>
Cc: Daniel Gustafsson <daniel(at)yesql(dot)se>, torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: Enhancing Memory Context Statistics Reporting
Date: 2026-02-10 04:50:53
Message-ID: AED12415-E763-433F-B9C4-AFD6519A8F8B@gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Feb 10, 2026, at 08:03, Rahila Syed <rahilasyed90(at)gmail(dot)com> wrote:
>
> Hi,
>
> Please find attached rebased and updated patches that include a few fixes
> suggested by Daniel.
>
> Fixes include
> 1. Changed LW_SHARED to LW_EXCLUSIVE when resetting client_keys.
> 2. Replaced calls to MemoryContextReset with MemoryContextDelete since a
> new memory context is created each time.
> 3. Removed leftover code from earlier versions in ipci.c
> 4. Fix the TAP test to compare strings correctly.
> 5. Added more comments in the TAP test.
>
> Thank you,
> Rahila Syed
> <v50-0001-Add-function-to-report-memory-context-statistics.patch><v50-0002-Test-module-to-test-memory-context-reporting-wit.patch>

Hi Rahila,

Thanks for the patch. I applied v50 locally and played with it a little bit then I reviewed v50. Basically I think it’s convenient to query a server process’ memory usage via a SQL statement.

Here comes my review comments.

1 - 0001- mcxt.c
```
+/*
+ * MemoryContextStatsCounter
+ *
+ * Accumulate statistics counts into *totals. totals should not be NULL.
+ * This involves a non-recursive tree traversal.
+ */
+void
+MemoryContextStatsCounter(MemoryContext context, MemoryContextCounters *totals,
+ int *num_contexts)
+{
+ int ichild = 1;
+
+ *num_contexts = 0;
+ context->methods->stats(context, NULL, NULL, totals, false);
```

As the header comment says that “totals should not be NULL”, maybe add Assert(total!=NULL) to ensure that.

2 - 0001- mcxt.c
```
+ *num_contexts = 0;
+ context->methods->stats(context, NULL, NULL, totals, false);
+
+ for (MemoryContext curr = context->firstchild;
+ curr != NULL;
+ curr = MemoryContextTraverseNext(curr, context))
+ {
+ curr->methods->stats(curr, NULL, NULL, totals, false);
+ ichild++;
+ }
+
+ /*
+ * Add the count of all the children contexts which are traversed
+ * including the parent.
+ */
+ *num_contexts = *num_contexts + ichild;
+}
```

*num_contexts is only initialized to 0, then *num_contexts = *num_contexts + ichild. Looks like the the initialization is unnecessary, and the assignment can just be *num_contexts = ichild.

3 - 0001 - mcxtfuncs.c
```
+typedef struct MemoryStatsEntry
+{
+ char name[MEMORY_CONTEXT_NAME_SHMEM_SIZE];
+ char ident[MEMORY_CONTEXT_IDENT_SHMEM_SIZE];
+ int path[100];
```

You already defined a constant MAX_PATH_DISPLAY_LENGTH below, would it make sense to pull up the macro definition and use it for “path” definition?

4 - 0001 - mctxfuncs.c
```
+/*
+ * Per backend dynamic shared hash entry for memory context statistics
+ * reporting.
+ */
+typedef struct MemoryStatsDSHashEntry
+{
+ char key[64];
+ ConditionVariable memcxt_cv;
+ bool stats_written;
+ int target_server_id;
+ int total_stats;
+ bool summary;
+ dsa_pointer memstats_dsa_pointer;
+} MemoryStatsDSHashEntry;
+
+static const dshash_parameters memctx_dsh_params = {
+ offsetof(MemoryStatsDSHashEntry, memcxt_cv),
+ sizeof(MemoryStatsDSHashEntry),
+ dshash_strcmp,
+ dshash_strhash,
+ dshash_strcpy
+};
```

I wonder why we cannot just use the integer ProcNumber as the hash key? I think dshash fully supports fixed-size binary keys. We can use dshash_memcmp and dshash_memhash for compare and hash functions.

5 - 0001 - mctxfuncs.c
```
+static List *
+compute_context_path(MemoryContext c, HTAB *context_id_lookup)
+{
+ bool found;
+ List *path = NIL;
+ MemoryContext cur_context;
+
+ for (cur_context = c; cur_context != NULL; cur_context = cur_context->parent)
+ {
+ MemoryContextId *cur_entry;
+
+ cur_entry = hash_search(context_id_lookup, &cur_context, HASH_FIND, &found);
+
+ if (!found)
+ {
+ elog(NOTICE, "hash table corrupted, can't construct path value");
+ return NIL;
+ }
+ path = lcons_int(cur_entry->context_id, path);
+ }
+ return path;
+}
```

As in the hash entry, path is a 100-elements array, would it make sense to also limit the path list to not append more than 100 items in this function?

6 - 0001 - mctxfuncs.c
```
+ entry->stats_written = true;
+ dshash_release_lock(MemoryStatsDsHash, entry);
+ hash_destroy(context_id_lookup);
+
+ MemoryContextSwitchTo(oldcontext);
+ MemoryContextDelete(memstats_ctx);
+ /* Notify waiting client backend and return */
+ ConditionVariableSignal(&entry->memcxt_cv);
+}
```

This looks like a race condition. ConditionVariableSignal(&entry->memcxt_cv); is called after the lock is released. So, there is a chance that a process has been terminated, and its before_shmem_exit callback acquired the entry lock and deleted the entry from the hash. So, I think we should send the signal before releasing the lock.

7 - 0001 - mctxfuncs.c
```
+ /*
+ * Wait for MEMORY_STATS_MAX_TIMEOUT. If no statistics are available
+ * within the allowed time then return NULL. The timer is defined in
+ * milliseconds since that's what the condition variable sleep uses.
+ */
+ if (ConditionVariableTimedSleep(&entry->memcxt_cv,
+ (MEMORY_STATS_MAX_TIMEOUT * 1000),
+ WAIT_EVENT_MEM_CXT_PUBLISH))
```

For the wait loop, if the condition wakes up for some other reason, it will loop back and wait for the other 5 seconds, then total wait period will exceed 5 seconds, maybe 9 seconds. This is not a big deal, but the doc explicitly says “If the process does not respond with memory contexts statistics in 5 seconds”, so that behavior might be inconsistent with the doc.

I think we can calculate remaining time and pass remaining milliseconds into ConditionVariableTimedSleep.

8 - 0001 - mctxfuncs.c
```
+ /*
+ * Wait for MEMORY_STATS_MAX_TIMEOUT. If no statistics are available
+ * within the allowed time then return NULL. The timer is defined in
+ * milliseconds since that's what the condition variable sleep uses.
+ */
+ if (ConditionVariableTimedSleep(&entry->memcxt_cv,
+ (MEMORY_STATS_MAX_TIMEOUT * 1000),
+ WAIT_EVENT_MEM_CXT_PUBLISH))
+ {
+ /* Timeout has expired, return NULL */
+ memstats_dsa_cleanup(key);
+ memstats_client_key_reset(procNumber);
+ ConditionVariableCancelSleep();
+ ereport(NOTICE,
+ errmsg("request for memory context statistics for PID %d timed out",
+ pid));
+ PG_RETURN_NULL();
+ }
+ entry = dshash_find_or_insert(MemoryStatsDsHash, key, &found);
+ Assert(found);
```

After ConditionVariableTimedSleep, rather than Assert(found), I think we should check if (!found). Because it waits for 5 seconds without holding the lock, the process may terminate and delete the entry during the period.

9 - 0001 - proc.c
```
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 8560a903bc8..f68583aa820 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -51,6 +51,7 @@
#include "storage/procsignal.h"
#include "storage/spin.h"
#include "storage/standby.h"
+#include "utils/memutils.h"
#include "utils/timeout.h"
#include "utils/timestamp.h"
```

This file is sololy added an include without any other change, that seems unneeded. I tried to remove this include, and the build still passed.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2026-02-10 04:52:43 Re: recovery.signal not cleaned up when both signal files are present
Previous Message Michael Paquier 2026-02-10 03:56:51 Re: Add expressions to pg_restore_extended_stats()