Re: make the stats collector shutdown without writing the statsfiles if the immediate shutdown is requested.

From: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
To: Masahiro Ikeda <ikedamsh(at)oss(dot)nttdata(dot)com>, Andres Freund <andres(at)anarazel(dot)de>
Cc: kuroda(dot)hayato(at)fujitsu(dot)com, Zhihong Yu <zyu(at)yugabyte(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: make the stats collector shutdown without writing the statsfiles if the immediate shutdown is requested.
Date: 2021-03-26 00:54:40
Message-ID: 3d721f7d-3c8f-6fc2-d0a9-94c299d6a01b@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2021/03/26 9:25, Masahiro Ikeda wrote:
>
> On 2021/03/25 21:23, Fujii Masao wrote:
>> On 2021/03/25 9:58, Andres Freund wrote:
>>> Outside of very
>>> narrow circumstances a normal exit shouldn't use _exit(1). Neither 1 no
>>> _exit() (as opposed to exit()) seem appropriate. This seems like a bad
>>> idea.
>>
>> So you're thinking that the stats collector should do proc_exit(0)
>> or something even when it receives immediate shutdown request?
>>
>> One idea to avoid using _exit(1) is to change the SIGQUIT handler
>> so that it just sets the flag. Then if the stats collector detects that
>> the flag is set in the main loop, it gets out of the loop,
>> skips writing the permanent stats file and then exits with exit(0).
>> That is, normal and immediate shutdown requests are treated
>> almost the same way in the stats collector. Only the difference of
>> them is whether it saves the stats to the file or not. Thought?
>
> Although I don't oppose the idea to change from _exit(1) to proc_exit(0), the
> reason why I used _exit(1) is that I thought the behavior to skip writing the
> stat counters is not normal exit. Actually, other background processes use
> _exit(2) instead of _exit(0) or proc_exit(0) in immediate shutdown although
> the status code is different because they touch shared memory.

True.

>
>
>>> Also, won't this lead to postmaster now starting to complain about
>>> pgstat having crashed in an immediate shutdown? Right now only exit
>>> status 0 is expected:
>>>
>>>         if (pid == PgStatPID)
>>>         {
>>>             PgStatPID = 0;
>>>             if (!EXIT_STATUS_0(exitstatus))
>>>                 LogChildExit(LOG, _("statistics collector process"),
>>>                              pid, exitstatus);
>>>             if (pmState == PM_RUN || pmState == PM_HOT_STANDBY)
>>>                 PgStatPID = pgstat_start();
>>>             continue;
>>>         }
>>
>> No. In immediate shutdown case, pmdie() sets "Shutdown" variable to
>> ImmdiateShutdown and HandleChildCrash() doesn't complain that in that case
>> because of the following.
>>
>>     /*
>>      * We only log messages and send signals if this is the first process
>>      * crash and we're not doing an immediate shutdown; otherwise, we're only
>>      * here to update postmaster's idea of live processes.  If we have already
>>      * signaled children, nonzero exit status is to be expected, so don't
>>      * clutter log.
>>      */
>>     take_action = !FatalError && Shutdown != ImmediateShutdown;
>
> IIUC, in immediate shutdown case (and other cases too?), HandleChildCrash() is
> never invoked due to the exit of pgstat. My understanding of Andres-san's
> comment is that is it ok to show like the following log message?
>
> ```
> LOG: statistics collector process (PID 64991) exited with exit code 1
> ```
>
> Surely, other processes don't output the log like it. So, I agree to suppress
> the log message.

Yes. I was mistakenly thinking that HandleChildCrash() was called
even when the stats collector exits with non-zero code, like other processes.
But that's not true.

How should we do this? HandleChildCrash() calls LogChildExit()
when FatalError = false and Shutdown != ImmediateShutdown.
We should use the same conditions for the stats collector as follows?

if (pid == PgStatPID)
{
PgStatPID = 0;
- if (!EXIT_STATUS_0(exitstatus))
+ if (!EXIT_STATUS_0(exitstatus) && !FatalError &&
+ Shutdown != ImmediateShutdown)
LogChildExit(LOG, _("statistics collector process"),
pid, exitstatus);

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2021-03-26 01:08:28 Re: wal stats questions
Previous Message Michael Paquier 2021-03-26 00:49:00 Re: DETAIL for wrong scram password