| From: | Rahila Syed <rahilasyed90(at)gmail(dot)com> |
|---|---|
| To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
| Cc: | Daniel Gustafsson <daniel(at)yesql(dot)se>, torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: Enhancing Memory Context Statistics Reporting |
| Date: | 2026-02-02 12:28:21 |
| Message-ID: | CAH2L28umvzm9YwQBa_zrJM0ABC57ZAKPBxi8bD+n=1ADJ3SOZw@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi Robert, Daniel
Please find attached the updated patches which incorporate your feedback.
Following are responses to Robert's comments:
Unfortunately, somebody is going to need to think through - and
> perhaps test - what happens in each individual type of background
> process -- not just auxiliary processes but also background workers,
> including but not limited to parallel workers. Some of them have error
> recovery logic that is similar to transaction cleanup but only covers
> a subset of things, in which case the question will be whether that
> logic handles all the kinds of resources that this code might acquire.
> Some of them may just straight up kill the process if an error occurs,
> which is fine for this patch as long as it only happens in extreme
> situations (e.g. OOM).
>
I tested the resource cleanup in case of error, with respect to each of the
processes types i.e Auxiliary process/background workers/client backends
and below are my findings.
The only resources acquired within the ProcessGetMemoryContextInterrupt
function are of the kind dsm_resowner_desc through GetNamedDSA and
GetNamedDsHash.
However, these functions make the resource owner forget these resources
when they call dsm_pin_mapping. This is done to prevent them from
getting detached at the time of error handling or transaction abort/commit.
Because when a dsm resource is released, the associated dsm segment is
detached.
Since we want the DSM segments to remain mapped until a process
exits, they are not added to resource owners. This is done intentionally to
prevent repeatedly attaching or detaching the DSM segments, if the same
process is queried for statistics multiple times.
As no resources are tracked during the execution
ProcessGetMemoryContextInterrupt
irrespective of its execution within or outside a transaction, the error
handling
in both the cases does not process resource cleanup unless the process
exits.
During proc_exit, the cleanup/detach runs via dsm_backend_shutdown().
> + LWLockAcquire(client_keys_lock, LW_SHARED);
>
> LWLocks normally use NamesLikeThis not names_like_this and are
> generally created by just listing them in lwlocklist.h.
>
Fixed.
Also, it's not really safe to acquire an LWLock if there's no
> transaction active. If we error afterward, what will release the
> LWLock?
I verified that when an error occurs outside of a transaction in various
types of processes, LWLockReleaseAll() is called during error handling.
1. For auxiliary processes - called within sigsetjmp() from main functions.
2. For client backend - called during process exit as error is converted to
"FATAL: terminating connection because protocol synchronization was lost"
3. For a background worker like parallel worker - called during process
exit,
as the worker is killed on encountering an error.
+ else
> + {
> + clientProcNumber = client_keys[MyProcNumber];
> + client_keys[MyProcNumber] = -1;
> + LWLockRelease(client_keys_lock);
> + }
>
> The "else" is not really necessary here because the "if" portion ends
> with "return
Fixed accordingly.
> One other point to note here is that if, by some chance, a failure
occurs as a result of receiving a memory-context interrupt, it's going
to fail the currently-running transaction
In the updated version, we make sure that no errors are being thrown
by ProcessMemoryContextInterrupt and other functions added by this
patch.
I have also added comments above the GetNamedDSA and GetNamedDsHash
functions to highlight the importance of minimizing errors in these
functions,
as they are called from CFI.
This way we try to prevent any non-critical errors from being thrown.
Following are responses to Daniel's comments:
> > Attached is the updated patch that addresses this.
>
> A few small comments on v47:
>
> +static const char *ContextTypeToString(NodeTag type);
> I think context_type_to_string() would be a better name on this internal
> function to model it closer to the existing int_list_to_array().
> Personally I
> would also place it before its first use to avoid the prototype, but that's
> personal preference.
>
>
Changed accordingly, also moved compute_context_path() so it appears
before its use to avoid the prototype.
>
> +static void
> +memstats_dsa_cleanup(char *key)
> This function warrants a documentation comment describing when it should be
> used safely.
>
Added a comment.
>
>
> + memstats_dsa_cleanup(key);
> + memstats_client_key_reset(procNumber);
> + ConditionVariableCancelSleep();
> + PG_RETURN_NULL();
> I think we should notify the user in these two timeout cases, why not
> adding an
> ereport(NOTICE, "request for memory context statistics timed out")); or
> something with a better wording than that.
>
> Added this.
> + Size sz = 0;
> + Size TotalProcs = 0;
> +
> + TotalProcs = add_size(TotalProcs, NUM_AUXILIARY_PROCS);
> + TotalProcs = add_size(TotalProcs, MaxBackends);
> + sz = add_size(sz, mul_size(TotalProcs, sizeof(int)));
> +
> + return sz
> As we discussed off-list, the call to add_size() call can be omitted as it
> won't affect the calculation.
>
Fixed accordingly.
>
>
> +# Copyright (c) 2025, PostgreSQL Global Development Group
> Here, and possibly elsewhere, it should say 2026 instead I think.
>
>
> Fixed.
Thank you,
Rahila Syed
| Attachment | Content-Type | Size |
|---|---|---|
| v49-0001-Add-function-to-report-memory-context-statistics.patch | application/octet-stream | 59.9 KB |
| v49-0002-Test-module-to-test-memory-context-reporting-wit.patch | application/octet-stream | 8.8 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Aleksander Alekseev | 2026-02-02 12:38:04 | Re: use the malloc macros in pg_dump.c |
| Previous Message | Fujii Masao | 2026-02-02 12:15:27 | Re: logical apply worker's lock waits in subscriber can stall checkpointer in publisher |