Re: Get memory contexts of an arbitrary backend process

From: torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>
To: Kasahara Tatsuhito <kasahara(dot)tatsuhito(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, Georgios Kokolatos <gkokolatos(at)protonmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Get memory contexts of an arbitrary backend process
Date: 2021-01-04 05:34:50
Message-ID: 9a59401241e9269254e70c8e0ef1cc7a@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Dec 25, 2020 at 6:08 PM Kasahara Tatsuhito
<kasahara(dot)tatsuhito(at)gmail(dot)com> wrote:

Thanks for reviewing and kind suggestion!

>> Attached a rewritten patch.
> Thanks for updating patch.
>
> But when I had applyed the patch to the current HEAD and did make, I
> got an error due to duplicate OIDs.
> You need to rebase the patch.

Assigned another OID.

>> Accordingly, I also slightly modified the basic design as below.
>>
>> ---
>> # Communication flow between the dumper and the requester
>> - (1) When requesting memory context dumping, the dumper changes
>> the struct on the shared memory state from 'ACCEPTABLE' to
>> 'REQUESTING'.
>> - (2) The dumper sends the signal to the dumper process and wait on
>> the latch.
>> - (3) When the dumper noticed the signal, it changes the state to
>> 'DUMPING'.
>> - (4) When the dumper completes dumping, it changes the state to
>> 'DONE' and set the latch.
>> - (5) The requestor reads the dump file and shows it to the user.
>> Finally, the requestor removes the dump file and reset the shared
>> memory state to 'ACCEPTABLE'.
>>
>> # Query cancellation
>> - When the requestor cancels dumping, e.g. signaling using ctrl-C,
>> the requestor changes the state of the shared memory to 'CANCELING'.
>> - The dumper checks the state when it tries to change the state to
>> 'DONE' at (4), and if the state is 'CANCELING', it initializes the
>> dump file and reset the shared memory state to 'ACCEPTABLE'.
>>
>> # Cleanup dump file and the shared memory
>> - In the normal case, the dumper removes the dump file and resets
>> the shared memory entry as described in (5).
>> - When something like query cancellation or process termination
>> happens on the dumper after (1) and before (3), in other words,
>> the state is 'REQUESTING', the requestor does the cleanup.
>> - When something happens on the dumper or the requestor after (3)
>> and before (4), in other words, the state is 'DUMPING', the dumper
>> does the cleanup. Specifically, if the requestor cancels the query,
>> it just changes the state to 'CANCELING' and the dumper notices it
>> and cleans up things later.
>> OTOH, when the dumper fails to dump, it cleans up the dump file and
>> reset the shared memory state.
>> - When something happens on the requestor after (4), i.e., the state
>> is 'DONE', the requestor does the cleanup.
>> - In the case of receiving SIGKILL or power failure, all dump files
>> are removed in the crash recovery process.
>> ---
> If the dumper is terminated before it dumps, the requestor will appear
> to enter an
> infinite loop because the status of mcxtdumpShmem will not change.
> The following are the steps to reproduce.
>
>  - session1
>    BEGIN; LOCK TABLE t;
>    - session2
>      SELECT * FROM t; -- wait
>      - session3
>        select pg_get_target_backend_memory_contexts(<pid of session2>);
> -- wait
>  - session1
>    select pg_terminate_backend(<pid of session2>); -- kill session2
>      - session3 waits forever.
>
> Therefore, you may need to set mcxtdumpShmem->dump_status to
> MCXTDUMPSTATUS_CANCELING
> or other status before the dumper terminates.

In this case, it may be difficult for the dumper to change dump_status
because
it's waiting for latch and dump_memory_contexts() is not called yet.

Instead, it's possible for the requestor to check the existence of the
dumper
process periodically during waiting.
I added this logic to the attached patch.

> Also, although I have not been able to reproduce it, I believe that
> with the current design,
> if the requestor disappears right after the dumper dumps the memory
> information,
> the dump file will remain.
> Since the current design appears to allow only one requestor per
> instance, when the requestor
> requests a dump, it might be a good idea to delete any remaining dump
> files, if any.

Although I'm not sure when the dump file remains, deleting any remaining
dump
files seems good for safety.
I also added this idea to the attached patch.

> The following are comments on the code.
>
> +   proc = BackendPidGetProc(dst_pid);
> +
> +   if (proc == NULL)
> +   {
> +       ereport(WARNING,
> +               (errmsg("PID %d is not a PostgreSQL server process",
> dst_pid)));
> +
> +       return (Datum) 1;
> +   }
> For now, background writer, checkpointer and WAL writer are belong to
> the auxiliary process.
> Therefore, if we specify the PIDs of these processes for
> pg_get_target_backend_memory_contexts(),
> "PID xxxx is not a PostgreSQL server process" whould be outoput.
> This confuses the user.
> How about use AuxiliaryPidGetProc() to determine these processes?

Thanks and I modified the patch to output the below message when it's an
auxiliary process.

| PID %d is not a PostgreSQL backend process but an auxiliary process.

>
> +               ereport(INFO,
> +                   (errmsg("The request has failed and now PID %d is
> requsting dumping.",
> +                       mcxtdumpShmem->src_pid)));
> +
> +               LWLockRelease(McxtDumpLock);
> You can release LWLock before ereport.

Modified to release the lock before ereport.

> +   Assert(mcxtdumpShmem->dump_status = MCXTDUMPSTATUS_REQUESTING);
> typo?
> It might be "mcxtdumpShmem->dump_status == MCXTDUMPSTATUS_REQUESTING".

Ops, it's a serious typo. Fixed it.

> +           ereport(INFO,
> +                   (errmsg("PID %d is looked up by another process",
> pid)));
> This message is not accurate.
> Because, in the current design, only one session can request a dump,
> so if there are multiple requests, this message will be output even if
> the
> request is for a PID that is not looked up.

Exactly. Modified the message as below.

| Only one session can dump at a time and another session is dumping
currently.

> +   if (fpout == NULL)
> +   {
> +       ereport(LOG,
> +               (errcode_for_file_access(),
> +                errmsg("could not write memory context file \"%s\":
> %m",
> +                       dumpfile)));
> It would be better to use "dump file" instead of "memory context file".
>
> +static void
> +McxtReqKill(int code, Datum arg)
> The function McxtReqKill looks like it should have a different name.
> (Since it doesn't do any kill).
> How about McxtReqcCleanup or something similar?

Thanks for your idea and I changed the name to McxtReqCleanup.

Regards,

Attachment Content-Type Size
v6-0001-Add-pg_get_target_backend_memory_contexts-to-coll.patch text/x-diff 33.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Dilger 2021-01-04 05:37:41 Re: Test harness for regex code (to allow importing Tcl's test suite)
Previous Message Abhijit Menon-Sen 2021-01-04 05:21:36 Re: pg_waldump/heapdesc.c and struct field names