Re: Crash by targetted recovery

From: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Crash by targetted recovery
Date: 2020-03-02 11:54:04
Message-ID: f7c69cf0-fcac-91b4-6fa2-f2560b6c49b5@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2020/02/28 12:13, Kyotaro Horiguchi wrote:
> At Thu, 27 Feb 2020 20:04:41 +0900, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote in
>>
>>
>> On 2020/02/27 17:05, Kyotaro Horiguchi wrote:
>>> Thank you for the comment.
>>> At Thu, 27 Feb 2020 16:23:44 +0900, Fujii Masao
>>> <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote in
>>>> On 2020/02/27 15:23, Kyotaro Horiguchi wrote:
>>>>>> I failed to understand why random access while reading from
>>>>>> stream is bad idea. Could you elaborate why?
>>>>> It seems to me the word "streaming" suggests that WAL record should be
>>>>> read sequentially. Random access, which means reading from arbitrary
>>>>> location, breaks a stream. (But the patch doesn't try to stop wal
>>>>> sender if randAccess.)
>>>>>
>>>>>> Isn't it sufficient to set currentSource to 0 when disabling
>>>>>> StandbyMode?
>>>>> I thought that and it should work, but I hesitated to manipulate on
>>>>> currentSource in StartupXLOG. currentSource is basically a private
>>>>> state of WaitForWALToBecomeAvailable. ReadRecord modifies it but I
>>>>> think it's not good to modify it out of the the logic in
>>>>> WaitForWALToBecomeAvailable.
>>>>
>>>> If so, what about adding the following at the top of
>>>> WaitForWALToBecomeAvailable()?
>>>>
>>>> if (!StandbyMode && currentSource == XLOG_FROM_STREAM)
>>>> currentSource = 0;
>>> It works virtually the same way. I'm happy to do that if you don't
>>> agree to using randAccess. But I'd rather do that in the 'if
>>> (!InArchiveRecovery)' section.
>>
>> The approach using randAccess seems unsafe. Please imagine
>> the case where currentSource is changed to XLOG_FROM_ARCHIVE
>> because randAccess is true, while walreceiver is still running.
>> For example, this case can occur when the record at REDO
>> starting point is fetched with randAccess = true after walreceiver
>> is invoked to fetch the last checkpoint record. The situation
>> "currentSource != XLOG_FROM_STREAM while walreceiver is
>> running" seems invalid. No?
>
> When I mentioned an possibility of changing ReadRecord so that it
> modifies randAccess instead of currentSource, I thought that
> WaitForWALToBecomeAvailable should shutdown wal receiver as
> needed.
>
> At Thu, 27 Feb 2020 15:23:07 +0900 (JST), Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote in
> me> location, breaks a stream. (But the patch doesn't try to stop wal
> me> sender if randAccess.)

Sorry, I failed to notice that.

> And random access during StandbyMode ususally (always?) lets RecPtr go
> back. I'm not sure WaitForWALToBecomeAvailable works correctly if we
> don't have a file in pg_wal and the REDO point is far back by more
> than a segment from the initial checkpoint record.

It works correctly. This is why WaitForWALToBecomeAvailable() uses
fetching_ckpt argument.

> If we go back to XLOG_FROM_ARCHIVE by random access, it correctly
> re-connects to the primary for the past segment.

But this can lead to unnecessary restart of walreceiver. Since
fetching_ckpt ensures that the WAL file containing the REDO
starting record exists in pg_wal, there is no need to reconnect
to the primary when reading the REDO starting record.

Is there other case where we need to go back to XLOG_FROM_ARCHIVE
by random access?

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Juan José Santamaría Flecha 2020-03-02 12:00:44 Re: color by default
Previous Message Amit Kapila 2020-03-02 11:26:40 Re: logical replication empty transactions