Re: make the stats collector shutdown without writing the statsfiles if the immediate shutdown is requested.

From: Masahiro Ikeda <ikedamsh(at)oss(dot)nttdata(dot)com>
To: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
Cc: kuroda(dot)hayato(at)fujitsu(dot)com, Zhihong Yu <zyu(at)yugabyte(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: make the stats collector shutdown without writing the statsfiles if the immediate shutdown is requested.
Date: 2021-03-18 10:16:18
Message-ID: c96d8989100e4bce4fa586064aa7e0e9@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2021-03-18 13:37, Fujii Masao wrote:
> On 2021/03/18 11:59, kuroda(dot)hayato(at)fujitsu(dot)com wrote:
>> Dear Ikeda-san,
>>
>> I confirmed new patch and no problem was found. Thanks.
>> (I'm not a native English speaker, so I cannot check your comments
>> correctly, sorry)
>
> One user-visible side-effect by this change is; with the patch, the
> stats is
> cleared when only the stats collector is killed (with SIGQUIT)
> accidentally
> and restarted by postmaster later.

Thanks for your comments.

As you said, the temporary stats files don't removed if the stats
collector is killed with SIGQUIT.
So, if the user change the GUC parameter "stats_temp_directory" after
immediate shutdown,
temporary stats file can't be removed forever.

But, I think this case is rarely happened and unimportant. Actually,
pgstat_write_statsfiles()
didn't check error of unlink() and the same problem is occurred if the
server is crashed now.
The documentation said following. I think it's enough.

```
For better performance, <varname>stats_temp_directory</varname> can
be
pointed at a RAM-based file system, decreasing physical I/O
requirements.
When the server shuts down cleanly, a permanent copy of the
statistics
data is stored in the <filename>pg_stat</filename> subdirectory, so
that
statistics can be retained across server restarts. When recovery is
performed at server start (e.g., after immediate shutdown, server
crash,
and point-in-time recovery), all statistics counters are reset.
```

> On the other than, currently the stats is
> written in that case and subsequently-restarted stats collector can use
> that stats file. I'm not sure if we need to keep supporting this
> behavior, though.

I don't have any strong opinion this behaivor is useless too.

Since the reinitialized phase is not executed when only the stats
collector is crashed
(since it didn't touch the shared memory), if the permanent stats file
is exists, the
stats collector can use it. But, IIUC, the case is rare.

The case is happened by operation mistake which a operator sends the
SIGQUIT signal to
the stats collector. Please let me know if there are more important
case.

But, if SIGKILL is sent by operator, the stats can't be rescure now
because the permanent stats
files can't be written before exiting. Since the case which can rescure
the stats is rare,
I think it's ok to initialize the stats even if SIGQUIT is sent.

If to support this feature, we need to implement the following first.

> (2) As another aspect, it needs to change the behavior removing all
> stats files because autovacuum
> uses the stats. There are some ideas, for example writing the stats
> files every X minutes
> (using wal or another mechanism) and even if a crash occurs, the
> startup process can restore
> the stats with slightly low freshness. But, it needs to find a way
> how to handle the stats files
> when deleting on PITR rewind or stats collector crash happens.

> When only the stats collector exits by SIGQUIT, with the patch
> FreeWaitEventSet() is also skipped. Is this ok?

Thanks, I fixed it.

> - * Loop to process messages until we get SIGQUIT or detect ungraceful
> - * death of our parent postmaster.
> + * Loop to process messages until we get SIGTERM or SIGQUIT of our
> parent
> + * postmaster.
>
> "SIGTERM or SIGQUIT of our parent postmaster" should be
> "SIGTERM, SIGQUIT, or detect ungraceful death of our parent
> postmaster"?

Yes, I fixed it.

> +SignalHandlerForUnsafeExit(SIGNAL_ARGS)
>
> I don't think SignalHandlerForUnsafeExit is good name. Because that's
> not
> "unsafe" exit. No? Even after this signal handler is triggered, the
> server is
> still running normally and a process that exits will be restarted
> later. What
> about SignalHandlerForNonCrashExit or SignalHandlerForNonFatalExit?

OK, I fixed.
I changed to the SignalPgstatHandlerForNonCrashExit() to add
FreeWaitEventSet()
in the handler for the stats collector.

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION

Attachment Content-Type Size
v4-0001-pgstat_avoid_writing_on_sigquit.patch text/x-diff 4.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2021-03-18 10:19:13 Re: fdatasync performance problem with large number of DB files
Previous Message iwata.aya@fujitsu.com 2021-03-18 10:01:43 RE: libpq debug log