Re: Get memory contexts of an arbitrary backend process

From: torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>
To: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, tgl(at)sss(dot)pgh(dot)pa(dot)us, gkokolatos(at)protonmail(dot)com, kasahara(dot)tatsuhito(at)gmail(dot)com, craig(at)2ndquadrant(dot)com
Subject: Re: Get memory contexts of an arbitrary backend process
Date: 2021-03-30 13:06:58
Message-ID: bbecd722d4f8e261b350186ac4bf68a7@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2021-03-30 02:28, Fujii Masao wrote:

Thanks for reviewing and kind suggestions!

>> It adds pg_log_backend_memory_contexts(pid) which logs memory contexts
>> of the specified backend process.
>>
>> The number of child contexts to be logged per parent is limited to 100
>> as with MemoryContextStats().
>>
>> As written in commit 7b5ef8f2d07, which limits the verbosity of
>> memory context statistics dumps, it supposes that practical cases
>> where the dump gets long will typically be huge numbers of
>> siblings under the same parent context; while the additional
>> debugging value from seeing details about individual siblings
>> beyond 100 will not be large.
>>
>> Thoughts?
>
> I'm OK with 100. We should comment why we chose 100 for that.

Added following comments.

+ /*
+ * When a backend process is consuming huge memory, logging all
its
+ * memory contexts might overrun available disk space. To
prevent
+ * this, we limit the number of child contexts per parent to
100.
+ *
+ * As with MemoryContextStats(), we suppose that practical cases
+ * where the dump gets long will typically be huge numbers of
+ * siblings under the same parent context; while the additional
+ * debugging value from seeing details about individual siblings
+ * beyond 100 will not be large.
+ */
+ MemoryContextStatsDetail(TopMemoryContext, 100, false);

>
> Here are some review comments.
>
> Isn't it better to move HandleProcSignalLogMemoryContext() and
> ProcessLogMemoryContextInterrupt() to mcxt.c from procsignal.c
> (like the functions for notify interrupt are defined in async.c)
> because they are the functions for memory contexts?

Agreed.
Also renamed HandleProcSignalLogMemoryContext to
HandleLogMemoryContextInterrupt.

> + * HandleProcSignalLogMemoryContext
> + *
> + * Handle receipt of an interrupt indicating log memory context.
> + * Signal handler portion of interrupt handling.
>
> IMO it's better to comment why we need to separate the function into
> two,
> i.e., HandleProcSignalLogMemoryContext() and
> ProcessLogMemoryContextInterrupt(),
> like the comment for other similar function explains. What about the
> followings?

Thanks! Changed them to the suggested one.

> -------------------------------
> HandleLogMemoryContextInterrupt
>
> Handle receipt of an interrupt indicating logging of memory contexts.
>
> All the actual work is deferred to ProcessLogMemoryContextInterrupt(),
> because we cannot safely emit a log message inside the signal handler.
> -------------------------------
> ProcessLogMemoryContextInterrupt
>
> Perform logging of memory contexts of this backend process.
>
> Any backend that participates in ProcSignal signaling must arrange to
> call this function if we see LogMemoryContextPending set. It is called
> from CHECK_FOR_INTERRUPTS(), which is enough because the target process
> for logging of memory contexts is a backend.
> -------------------------------
>
>
> + if (CheckProcSignal(PROCSIG_LOG_MEMORY_CONTEXT))
> + HandleProcSignalLogMemoryContext();
> +
> if (CheckProcSignal(PROCSIG_BARRIER))
> HandleProcSignalBarrierInterrupt();
>
> The code for memory context logging interrupt came after barrier
> interrupt
> in other places, e.g., procsignal.h. Why is this order of code
> different?

Fixed.

> +/*
> + * pg_log_backend_memory_contexts
> + * Print memory context of the specified backend process.
>
> Isn't it better to move pg_log_backend_memory_contexts() to mcxtfuncs.c
> from mcxt.c because this is the SQL function memory contexts?

Agreed.

> IMO we should comment why we allow only superuser to call this
> function.
> What about the following?

Thanks!
Modified the patch according to the suggestions.

> -----------------
> Signal a backend process to log its memory contexts.
>
> Only superusers are allowed to signal to log the memory contexts
> because allowing any users to issue this request at an unbounded rate
> would cause lots of log messages and which can lead to denial of
> service.
> -----------------
>
> + PGPROC *proc = BackendPidGetProc(pid);
> +
> + /* Check whether the target process is PostgreSQL backend process. */
> + if (proc == NULL)
>
> What about adding more comments as follows?
>
> ---------------------------------
> + /*
> + * BackendPidGetProc returns NULL if the pid isn't valid; but by the
> time
> + * we reach kill(), a process for which we get a valid proc here
> might
> + * have terminated on its own. There's no way to acquire a lock on
> an
> + * arbitrary process to prevent that. But since this mechanism is
> usually
> + * used to debug a backend running and consuming lots of memory,
> + * that it might end on its own first and its memory contexts are not
> + * logged is not a problem.
> + */
> + if (proc == NULL)
> + {
> + /*
> + * This is just a warning so a loop-through-resultset will not abort
> + * if one backend logged its memory contexts during the run.
> + */
> + ereport(WARNING,
> ---------------------------------
>
> + ereport(ERROR,
> + (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
> + errmsg("must be a superuser to log memory contexts")));
> + PG_RETURN_BOOL(false);
>
> This PG_RETURN_BOOL(false) is unnecessary because ereport() will emit
> an ERROR?
>
> +$node->psql('postgres', 'select
> pg_log_backend_memory_contexts(pg_backend_pid())');
> +
> +my $logfile = slurp_file($node->logfile);
> +like($logfile, qr/Grand total: \d+ bytes in \d+ blocks/,
> + 'found expected memory context print in the log file');
>
> Isn't there the case where the memory contexts have not been logged yet
> when slurp_file() is called? It's rare, though. If there is the risk
> for that,
> we should wait for the memory contexts to be actually logged?
>
> BTW, I agree that we should have the regression test that calls
> pg_log_backend_memory_contexts() and checks, e.g., that it doesn't
> cause
> segmentation fault. But I'm just wondering if it's a bit overkill to
> make perl
> scriptand start new node to test only this function...

OK, I removed the perl TAP test and added a regression test running
pg_log_backend_memory_contexts(pg_backend_pid()) and checking it returns
't'.

Regards.

Attachment Content-Type Size
v6-0001-add-memorycontext-elog-print.patch text/x-diff 26.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Arseny Sher 2021-03-30 13:22:18 Re: Flaky vacuum truncate test in reloptions.sql
Previous Message Julien Rouhaud 2021-03-30 12:54:41 Re: Issue with point_ops and NaN