Re: make the stats collector shutdown without writing the statsfiles if the immediate shutdown is requested.

From: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Masahiro Ikeda <ikedamsh(at)oss(dot)nttdata(dot)com>, kuroda(dot)hayato(at)fujitsu(dot)com, Zhihong Yu <zyu(at)yugabyte(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: make the stats collector shutdown without writing the statsfiles if the immediate shutdown is requested.
Date: 2021-03-25 12:23:17
Message-ID: 633cbcc3-08cf-2560-e5cf-91f567282dbc@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2021/03/25 9:58, Andres Freund wrote:
> Hi,
>
> On 2021-03-24 18:36:14 +0900, Fujii Masao wrote:
>> Barring any objection, I'm thinking to commit this patch.
>
> To which branches?

I was thinking to push the patch to the master branch
because this is not a bug fix.

> I am *strongly* opposed to backpatching this.

+1

> This strikes me as a quite a misleading function name.

Yeah, better name is always welcome :)

> Outside of very
> narrow circumstances a normal exit shouldn't use _exit(1). Neither 1 no
> _exit() (as opposed to exit()) seem appropriate. This seems like a bad
> idea.

So you're thinking that the stats collector should do proc_exit(0)
or something even when it receives immediate shutdown request?

One idea to avoid using _exit(1) is to change the SIGQUIT handler
so that it just sets the flag. Then if the stats collector detects that
the flag is set in the main loop, it gets out of the loop,
skips writing the permanent stats file and then exits with exit(0).
That is, normal and immediate shutdown requests are treated
almost the same way in the stats collector. Only the difference of
them is whether it saves the stats to the file or not. Thought?

> Also, won't this lead to postmaster now starting to complain about
> pgstat having crashed in an immediate shutdown? Right now only exit
> status 0 is expected:
>
> if (pid == PgStatPID)
> {
> PgStatPID = 0;
> if (!EXIT_STATUS_0(exitstatus))
> LogChildExit(LOG, _("statistics collector process"),
> pid, exitstatus);
> if (pmState == PM_RUN || pmState == PM_HOT_STANDBY)
> PgStatPID = pgstat_start();
> continue;
> }

No. In immediate shutdown case, pmdie() sets "Shutdown" variable to
ImmdiateShutdown and HandleChildCrash() doesn't complain that in that case
because of the following.

/*
* We only log messages and send signals if this is the first process
* crash and we're not doing an immediate shutdown; otherwise, we're only
* here to update postmaster's idea of live processes. If we have already
* signaled children, nonzero exit status is to be expected, so don't
* clutter log.
*/
take_action = !FatalError && Shutdown != ImmediateShutdown;

>> + * The statistics collector is started by the postmaster as soon as the
>> + * startup subprocess finishes, or as soon as the postmaster is ready
>> + * to accept read only connections during archive recovery. It remains
>> + * alive until the postmaster commands it to terminate. Normal
>> + * termination is by SIGUSR2 after the checkpointer must exit(0),
>> + * which instructs the statistics collector to save the final statistics
>> + * to reuse at next startup and then exit(0).
>
> I can't parse "...after the checkpointer must exit(0), which instructs..."

What about changing that to the following?

------------------------
Normal termination is requested after checkpointer exits. It's by SIGUSR2,
which instructs the statistics collector to save the final statistics and
then exit(0).
------------------------

>> + * Emergency termination is by SIGQUIT; like any backend, the statistics
>> + * collector will exit quickly without saving the final statistics. It's
>> + * ok because the startup process will remove all statistics at next
>> + * startup after emergency termination.
>
> Normally there won't be any stats to remove, no? The permanent stats
> file has been removed when the stats collector started up.

Yes. In normal case, when the stats collector starts up, it loads the stats
from the file and remove the file. OTOH, when the recovery is necessary,
instead the startup process calls pgstat_reset_all() and removes the stats files.

You're thinking that the above comments are confusing?
If so, what about the following?

----------------------
Emergency termination is by SIGQUIT; the statistics collector
will exit quickly without saving the final statistics. In this case
the statistics files don't need to be written because the next startup
will trigger a crash recovery and remove all statistics files forcibly
even if they exist.
----------------------

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Steele 2021-03-25 12:24:36 Re: [PATCH 1/1] Initial mach based shared memory support.
Previous Message Daniel Verite 2021-03-25 12:13:25 Re: insensitive collations