Re: Crash by targetted recovery

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: masao(dot)fujii(at)oss(dot)nttdata(dot)com
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Crash by targetted recovery
Date: 2020-02-27 08:05:30
Message-ID: 20200227.170530.2264412619413068664.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thank you for the comment.
At Thu, 27 Feb 2020 16:23:44 +0900, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote in
> On 2020/02/27 15:23, Kyotaro Horiguchi wrote:
> >> I failed to understand why random access while reading from
> >> stream is bad idea. Could you elaborate why?
> > It seems to me the word "streaming" suggests that WAL record should be
> > read sequentially. Random access, which means reading from arbitrary
> > location, breaks a stream. (But the patch doesn't try to stop wal
> > sender if randAccess.)
> >
> >> Isn't it sufficient to set currentSource to 0 when disabling
> >> StandbyMode?
> > I thought that and it should work, but I hesitated to manipulate on
> > currentSource in StartupXLOG. currentSource is basically a private
> > state of WaitForWALToBecomeAvailable. ReadRecord modifies it but I
> > think it's not good to modify it out of the the logic in
> > WaitForWALToBecomeAvailable.
>
> If so, what about adding the following at the top of
> WaitForWALToBecomeAvailable()?
>
> if (!StandbyMode && currentSource == XLOG_FROM_STREAM)
> currentSource = 0;

It works virtually the same way. I'm happy to do that if you don't
agree to using randAccess. But I'd rather do that in the 'if
(!InArchiveRecovery)' section.

> > Come to think of that I got to think the
> > following part in ReadRecord should use randAccess instead..
> > xlog.c:4384
> >> /*
> > - * Before we retry, reset lastSourceFailed and currentSource
> > - * so that we will check the archive next.
> > + * Streaming has broken, we retry from the same LSN.
> >> */
> >> lastSourceFailed = false;
> > - currentSource = 0;
> > + private->randAccess = true;
>
> Sorry, I failed to understand why this change is necessary...

It's not necessary, just for being tidy about the responsibility on
currentSource.

> At least the comment that you added seems incorrect
> because WAL streaming should not have started yet when
> we reach the above point.

Oops, right.

- * Streaming has broken, we retry from the same LSN.
+ * Restart recovery from the current LSN.

For clarity, I don't insist on the change at all. If it were
necessary, it's another topic, anyway. Please forget it.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
v2-0001-TAP-test-for-a-crash-bug.patch text/x-patch 2.1 KB
v2-0002-Fix-a-crash-bug-of-targetted-promotion.patch text/x-patch 1.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Julien Rouhaud 2020-02-27 08:07:35 Re: reindex concurrently and two toast indexes
Previous Message Julien Rouhaud 2020-02-27 07:45:35 Re: Collation versioning