Re: Enhancing Memory Context Statistics Reporting

From: torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>
To: Rahila Syed <rahilasyed90(at)gmail(dot)com>
Cc: Daniel Gustafsson <daniel(at)yesql(dot)se>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Enhancing Memory Context Statistics Reporting
Date: 2025-12-08 01:26:33
Message-ID: 44a9d3857aaed49b818dc893dde9f6b4@oss.nttdata.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2025-11-28 18:22, Rahila Syed wrote:

Hi,

> I'm attaching the updated patches, which primarily include cleanup and
> have been rebased
> following the CFbot report.

Thanks for updating the patch!

I observed an assertion failure when forcing a timeout as follows:

```
$ psql
(pid:38587)=#

$ kill -s SIGSTOP 38587

$ psql
(pid:38618) =# select * from pg_get_process_memory_contexts(38587,
false);
name | ident | type | level | path | total_bytes |
total_nblocks | free_bytes | free_chunks | used_bytes | num_agg_contexts

--------+--------+--------+--------+--------+-------------+---------------+------------+-------------+------------+------------------
[NULL] | [NULL] | [NULL] | [NULL] | [NULL] | [NULL] |
[NULL] | [NULL] | [NULL] | [NULL] | [NULL]
(1 row)

Time: 5013.515 ms (00:05.014)

$ kill -s SIGCONT 38587

$ tail postgresql.log

TRAP: failed Assert("client_keys[MyProcNumber] != -1"), File:
"mcxtfuncs.c", Line: 881, PID: 38587
0 postgres 0x0000000104943400
ExceptionalCondition + 216
1 postgres 0x000000010480f738
ProcessGetMemoryContextInterrupt + 140
2 postgres 0x00000001046f2710
ProcessInterrupts + 3008
3 postgres 0x00000001046f1a78
ProcessClientReadInterrupt + 80
4 postgres 0x0000000104433994 secure_read
+ 404
5 postgres 0x00000001044411dc pq_recvbuf
+ 260
6 postgres 0x0000000104441088 pq_getbyte
+ 96
7 postgres 0x00000001046fa0fc
SocketBackend + 44
8 postgres 0x00000001046f6d3c ReadCommand
+ 44
9 postgres 0x00000001046f6284
PostgresMain + 2900
10 postgres 0x00000001046ed558
BackendInitialize + 0
11 postgres 0x00000001045c0a48
postmaster_child_launch + 456
12 postgres 0x00000001045c8520
BackendStartup + 304
13 postgres 0x00000001045c636c ServerLoop
+ 372
14 postgres 0x00000001045c4e24
PostmasterMain + 6448
15 postgres 0x0000000104445b4c main + 924
16 dyld 0x0000000188662b98 start +
6076
2025-12-08 07:35:32.608 JST [38540] LOG: 00000: client backend (PID
38587) was terminated by signal 6: Abort trap: 6
2025-12-08 07:35:32.608 JST [38540] LOCATION: LogChildExit,
postmaster.c:2872
```

Below are comments regarding the v42-0001 patch:

> In order to not block on busy processes, we have hardcoded
> the number of seconds during which to retry before timing out.
> In the case where no statistics are published within the set
> timeout, NULL is returned.

It might be good to also document in func-admin.sgml that the function
times out after 5 seconds when the target backend does not respond, and
that in such a case NULLs are returned.

+ * If DSA exists, created by another process requesting statistics,
attach
+ * to it. We expect the client process to create required DSA and
Dshash
+ * table.
+ */
+ if (MemoryStatsDsaArea == NULL)
+ MemoryStatsDsaArea =
GetNamedDSA("memory_context_statistics_dsa",
+ &found);
+
+ if (MemoryStatsDsHash == NULL)
+ MemoryStatsDsHash =
GetNamedDSHash("memory_context_statistics_dshash",
+ &memctx_dsh_params, &found);

From the comment, it sounded to me as if the client executing
pg_get_process_memory_contexts() might not create the DSA in some cases.
Is it correct to assume that such a situation can happen?
In [1], as a response to concerns about using DSA inside a CFI handler,
you wrote that “all the dynamic shared memory needed to store the
statistics is created and deleted in the client function”.
So I understood that it would never create the DSA inside the CFI
handler.
If that understanding is correct, perhaps the comment should be reworded
to make that clear.

+ context_id_lookup =
hash_create("pg_get_remote_backend_memory_contexts",

This appears to use the old function name. Should this be updated to
"pg_get_process_memory_contexts" instead?

[1]
https://www.postgresql.org/message-id/CAH2L28sc-rEhyntPLoaC2XUa0ZjS5ka6KzEbuSVxQBBnUYu1KQ%40mail.gmail.com

--
Regards,

--
Atsushi Torikoshi
Seconded from NTT DATA Japan Corporation to SRA OSS K.K.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Zhijie Hou (Fujitsu) 2025-12-08 01:56:42 RE: Newly created replication slot may be invalidated by checkpoint
Previous Message Michael Paquier 2025-12-08 01:25:45 Re: Extended Statistics set/restore/clear functions.