Re: make the stats collector shutdown without writing the statsfiles if the immediate shutdown is requested.

From: Masahiro Ikeda <ikedamsh(at)oss(dot)nttdata(dot)com>
To: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, Andres Freund <andres(at)anarazel(dot)de>
Cc: kuroda(dot)hayato(at)fujitsu(dot)com, Zhihong Yu <zyu(at)yugabyte(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: make the stats collector shutdown without writing the statsfiles if the immediate shutdown is requested.
Date: 2021-03-26 00:27:19
Message-ID: 70fb08e8-c141-77aa-347e-804c1be6f959@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2021/03/25 19:48, Fujii Masao wrote:
>
>
> On 2021/03/25 9:31, Masahiro Ikeda wrote:
>>
>>
>> On 2021/03/24 18:36, Fujii Masao wrote:
>>>
>>>
>>> On 2021/03/24 3:51, Andres Freund wrote:
>>>> Hi,
>>>>
>>>> On 2021-03-23 15:50:46 +0900, Fujii Masao wrote:
>>>>> This fact makes me wonder that if we collect the statistics about WAL
>>>>> writing
>>>>> from walreceiver as we discussed in other thread, the stats collector should
>>>>> be invoked at more earlier stage. IIUC walreceiver can be invoked before
>>>>> PMSIGNAL_BEGIN_HOT_STANDBY is sent.
>>>>
>>>> FWIW, in the shared memory stats patch the stats subsystem is
>>>> initialized early on by the startup process.
>>>
>>> This is good news!
>>
>> Fujii-san, Andres-san,
>> Thanks for your comments!
>>
>> I didn't think about the start order. From the point of view, I noticed that
>> the current source code has two other concerns.
>>
>>
>> 1. This problem is not only for the wal receiver.
>>
>> The problem which the wal receiver starts before the stats collector
>> is launched during archive recovery is not only for the the wal receiver but
>> also the checkpointer and the bgwriter. Before starting redo, the startup
>> process sends the postmaster "PMSIGNAL_RECOVERY_STARTED" signal to launch the
>> checkpointer and the bgwriter to be able to perform creating restartpoint.
>>
>> Although the socket for communication between the stats collector and the
>> other processes is made in earlier stage via pgstat_init(), I agree to make
>> the stats collector starts earlier stage is defensive. BTW, in my
>> environments(linux, net.core.rmem_default = 212992), the socket can buffer
>> almost 300 WAL stats messages. This mean that, as you said, if the redo phase
>> is too long, it can lost the messages easily.
>>
>>
>> 2. To make the stats clear in redo phase.
>>
>> The statistics can be reset after the wal receiver, the checkpointer and
>> the wal writer are started in redo phase. So, it's not enough the stats
>> collector is invoked at more earlier stage. We need to fix it.
>>
>>
>>
>> (I hope I am not missing something.)
>> Thanks to Andres-san's work([1]), the above problems will be handle in the
>> shared memory stats patch. First problem will be resolved since the stats are
>> collected in shared memory, so the stats collector process is unnecessary
>> itself. Second problem will be resolved to remove the reset code because the
>> temporary stats file won't generated, and if the permanent stats file
>> corrupted, just recreate it.
>
> Yes. So we should wait for the shared memory stats patch to be committed
> before working on walreceiver stats patch more?

Yes, I agree.

Regards,

--
Masahiro Ikeda
NTT DATA CORPORATION

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2021-03-26 00:49:00 Re: DETAIL for wrong scram password
Previous Message Masahiro Ikeda 2021-03-26 00:25:47 Re: make the stats collector shutdown without writing the statsfiles if the immediate shutdown is requested.