Re: Crash by targetted recovery

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: masao(dot)fujii(at)oss(dot)nttdata(dot)com
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Crash by targetted recovery
Date: 2020-02-28 03:13:18
Message-ID: 20200228.121318.27427858650486112.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Thu, 27 Feb 2020 20:04:41 +0900, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote in
>
>
> On 2020/02/27 17:05, Kyotaro Horiguchi wrote:
> > Thank you for the comment.
> > At Thu, 27 Feb 2020 16:23:44 +0900, Fujii Masao
> > <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote in
> >> On 2020/02/27 15:23, Kyotaro Horiguchi wrote:
> >>>> I failed to understand why random access while reading from
> >>>> stream is bad idea. Could you elaborate why?
> >>> It seems to me the word "streaming" suggests that WAL record should be
> >>> read sequentially. Random access, which means reading from arbitrary
> >>> location, breaks a stream. (But the patch doesn't try to stop wal
> >>> sender if randAccess.)
> >>>
> >>>> Isn't it sufficient to set currentSource to 0 when disabling
> >>>> StandbyMode?
> >>> I thought that and it should work, but I hesitated to manipulate on
> >>> currentSource in StartupXLOG. currentSource is basically a private
> >>> state of WaitForWALToBecomeAvailable. ReadRecord modifies it but I
> >>> think it's not good to modify it out of the the logic in
> >>> WaitForWALToBecomeAvailable.
> >>
> >> If so, what about adding the following at the top of
> >> WaitForWALToBecomeAvailable()?
> >>
> >> if (!StandbyMode && currentSource == XLOG_FROM_STREAM)
> >> currentSource = 0;
> > It works virtually the same way. I'm happy to do that if you don't
> > agree to using randAccess. But I'd rather do that in the 'if
> > (!InArchiveRecovery)' section.
>
> The approach using randAccess seems unsafe. Please imagine
> the case where currentSource is changed to XLOG_FROM_ARCHIVE
> because randAccess is true, while walreceiver is still running.
> For example, this case can occur when the record at REDO
> starting point is fetched with randAccess = true after walreceiver
> is invoked to fetch the last checkpoint record. The situation
> "currentSource != XLOG_FROM_STREAM while walreceiver is
> running" seems invalid. No?

When I mentioned an possibility of changing ReadRecord so that it
modifies randAccess instead of currentSource, I thought that
WaitForWALToBecomeAvailable should shutdown wal receiver as
needed.

At Thu, 27 Feb 2020 15:23:07 +0900 (JST), Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote in
me> location, breaks a stream. (But the patch doesn't try to stop wal
me> sender if randAccess.)

And random access during StandbyMode ususally (always?) lets RecPtr go
back. I'm not sure WaitForWALToBecomeAvailable works correctly if we
don't have a file in pg_wal and the REDO point is far back by more
than a segment from the initial checkpoint record. (It seems to cause
assertion failure, but I haven't checked that.)

If we go back to XLOG_FROM_ARCHIVE by random access, it correctly
re-connects to the primary for the past segment.

> So I think that the approach that I proposed is better.

It depends on how far we assume RecPtr go back.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2020-02-28 03:57:59 Re: Autovacuum on partitioned table
Previous Message Amit Langote 2020-02-28 02:31:23 Re: Autovacuum on partitioned table