Re: make the stats collector shutdown without writing the statsfiles if the immediate shutdown is requested.

From: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
To: Masahiro Ikeda <ikedamsh(at)oss(dot)nttdata(dot)com>, Andres Freund <andres(at)anarazel(dot)de>
Cc: kuroda(dot)hayato(at)fujitsu(dot)com, Zhihong Yu <zyu(at)yugabyte(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: make the stats collector shutdown without writing the statsfiles if the immediate shutdown is requested.
Date: 2021-03-25 10:48:12
Message-ID: 5289df2d-acce-ca30-9a5e-ab75f621cc29@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2021/03/25 9:31, Masahiro Ikeda wrote:
>
>
> On 2021/03/24 18:36, Fujii Masao wrote:
>>
>>
>> On 2021/03/24 3:51, Andres Freund wrote:
>>> Hi,
>>>
>>> On 2021-03-23 15:50:46 +0900, Fujii Masao wrote:
>>>> This fact makes me wonder that if we collect the statistics about WAL writing
>>>> from walreceiver as we discussed in other thread, the stats collector should
>>>> be invoked at more earlier stage. IIUC walreceiver can be invoked before
>>>> PMSIGNAL_BEGIN_HOT_STANDBY is sent.
>>>
>>> FWIW, in the shared memory stats patch the stats subsystem is
>>> initialized early on by the startup process.
>>
>> This is good news!
>
> Fujii-san, Andres-san,
> Thanks for your comments!
>
> I didn't think about the start order. From the point of view, I noticed that
> the current source code has two other concerns.
>
>
> 1. This problem is not only for the wal receiver.
>
> The problem which the wal receiver starts before the stats collector
> is launched during archive recovery is not only for the the wal receiver but
> also the checkpointer and the bgwriter. Before starting redo, the startup
> process sends the postmaster "PMSIGNAL_RECOVERY_STARTED" signal to launch the
> checkpointer and the bgwriter to be able to perform creating restartpoint.
>
> Although the socket for communication between the stats collector and the
> other processes is made in earlier stage via pgstat_init(), I agree to make
> the stats collector starts earlier stage is defensive. BTW, in my
> environments(linux, net.core.rmem_default = 212992), the socket can buffer
> almost 300 WAL stats messages. This mean that, as you said, if the redo phase
> is too long, it can lost the messages easily.
>
>
> 2. To make the stats clear in redo phase.
>
> The statistics can be reset after the wal receiver, the checkpointer and
> the wal writer are started in redo phase. So, it's not enough the stats
> collector is invoked at more earlier stage. We need to fix it.
>
>
>
> (I hope I am not missing something.)
> Thanks to Andres-san's work([1]), the above problems will be handle in the
> shared memory stats patch. First problem will be resolved since the stats are
> collected in shared memory, so the stats collector process is unnecessary
> itself. Second problem will be resolved to remove the reset code because the
> temporary stats file won't generated, and if the permanent stats file
> corrupted, just recreate it.

Yes. So we should wait for the shared memory stats patch to be committed
before working on walreceiver stats patch more?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amul Sul 2021-03-25 10:48:45 Re: [CLOBBER_CACHE]Server crashed with segfault 11 while executing clusterdb
Previous Message Fujii Masao 2021-03-25 10:01:23 Re: wal stats questions