Re: Start Walreceiver completely before shut down it on standby server.

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: aagrawal(at)pivotal(dot)io
Cc: liujk1994(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Start Walreceiver completely before shut down it on standby server.
Date: 2019-12-11 05:37:37
Message-ID: 20191211.143737.1611770954720850726.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Tue, 10 Dec 2019 10:40:53 -0800, Ashwin Agrawal <aagrawal(at)pivotal(dot)io> wrote in
> On Tue, Dec 10, 2019 at 3:06 AM jiankang liu <liujk1994(at)gmail(dot)com> wrote:
>
> > Start Walreceiver completely before shut down it on standby server.
> >
> > The walreceiver will be shut down, when read an invalid record in the
> > WAL streaming from master.And then, we retry from archive/pg_wal again.
> >
> > After that, we start walreceiver in RequestXLogStreaming(), and read
> > record from the WAL streaming. But before walreceiver starts, we read
> > data from file which be streamed over and present in pg_wal by last
> > time, because of walrcv->receivedUpto > RecPtr and the wal is actually
> > flush on disk. Now, we read the invalid record again, what the next to
> > do? Shut down the walreceiver and do it again.
> >
>
> I am missing something here, if walrcv->receivedUpto > RecPtr, why are we
> getting / reading invalid record?

I bet on that the standby is connecting to a wrong master. For
example, something like happens when the master has been reinitalized
from a backup and experienced another history, then the standby was
initialized from the reborn master but the stale archive files on the
standby are left alone.

Anyway that cannot happen on correctly running replication set and
what to do in the case is starting from a new basebackup of the
master, making sure to erase stale archive files if any.

About the proposed fix, it doesn't seem to cause start process to
rewind WAL to that LSN. Even if that happens, it leads to no better
than a broken database.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message vignesh C 2019-12-11 05:43:04 Re: Reorderbuffer crash during recovery
Previous Message Amit Kapila 2019-12-11 05:30:21 Re: Wrong assert in TransactionGroupUpdateXidStatus