From: | Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> |
---|---|
To: | Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> |
Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Crash by targetted recovery |
Date: | 2020-03-05 10:51:11 |
Message-ID: | 0db8f83f-5b66-75e9-6d99-93f19fc33165@oss.nttdata.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2020/03/05 12:08, Kyotaro Horiguchi wrote:
> At Mon, 2 Mar 2020 20:54:04 +0900, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote in
>>> And random access during StandbyMode ususally (always?) lets RecPtr go
>>> back. I'm not sure WaitForWALToBecomeAvailable works correctly if we
>>> don't have a file in pg_wal and the REDO point is far back by more
>>> than a segment from the initial checkpoint record.
>>
>> It works correctly. This is why WaitForWALToBecomeAvailable() uses
>> fetching_ckpt argument.
>
> Hmm. You're right. We start streaming from RedoStartLSN when
> fetching_ckpt. So that doesn't happen.
>
>>> If we go back to XLOG_FROM_ARCHIVE by random access, it correctly
>>> re-connects to the primary for the past segment.
>>
>> But this can lead to unnecessary restart of walreceiver. Since
>> fetching_ckpt ensures that the WAL file containing the REDO
>> starting record exists in pg_wal, there is no need to reconnect
>> to the primary when reading the REDO starting record.
>>
>> Is there other case where we need to go back to XLOG_FROM_ARCHIVE
>> by random access?
>
> I understand that the reconnection for REDO record is useless. Ok I
> take the !StandbyMode way.
>
> The attached is the test script that is changed to count the added
> test, and the slight revised main patch.
Thanks for the patch!
+ /* Wal receiver should not active when entring XLOG_FROM_ARCHIVE */
+ Assert(!WalRcvStreaming());
+1 to add this assertion check.
Isn't it better to always check this while trying to read WAL from
archive or pg_wal? So, what about the following change?
{
case XLOG_FROM_ARCHIVE:
case XLOG_FROM_PG_WAL:
+ /*
+ * WAL receiver should not be running while trying to
+ * read WAL from archive or pg_wal.
+ */
+ Assert(!WalRcvStreaming());
+
/* Close any old file we might have open. */
if (readFile >= 0)
+ lastSourceFailed = false; /* We haven't failed on the new source */
Is this really necessary? Since ReadRecord() always reset
lastSourceFailed to false, it seems not necessary.
- else if (currentSource == 0)
+ else if (currentSource == 0 ||
Though this is a *separate topic*, 0 should be XLOG_FROM_ANY?
There are some places where 0 is used as the value of currentSource.
IMO they should be updated so that XLOG_FROM_ANY is used instead of 0.
Regards,
--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters
From | Date | Subject | |
---|---|---|---|
Next Message | Fujii Masao | 2020-03-05 11:16:11 | Re: Some problems of recovery conflict wait events |
Previous Message | Darafei Komяpa Praliaskouski | 2020-03-05 10:16:22 | Re: Berserk Autovacuum (let's save next Mandrill) |