Re: Start Walreceiver completely before shut down it on standby server.

From: jiankang liu <liujk1994(at)gmail(dot)com>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: Ashwin Agrawal <aagrawal(at)pivotal(dot)io>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Start Walreceiver completely before shut down it on standby server.
Date: 2019-12-11 08:06:26
Message-ID: CAJ+DhQb+JX3JHpF_ktOvai42y43p_N8-pOYHOsqOrCci5HvEhw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I'm sorry I did not say it clearly.

During my use of PG, I encountered such errors "incorrect resource manager
data checksum in record at 0/5013730",
it will keeps printing this error message and never stops on standby
server,at same time, the walreceier process is lost.
In a few months, we encountered this situation twice, each time testing
more than 200 connections to read/write, after 2 or 3 days of continuous
operation.
Maybe disk problem, but also the line is operating system problems, leading
off the data disk error occurred.

It will print this error never stops, and the walreceiver is lost, if we do
nothing.
Just restart standby server of PG, it only print this error message once,
and then connect to master. Everthing is OK.
Why we neet to restart my server, it can not fixs that problem online? Why
the walreceiver is lost?

The record has been flushed to disk by the walreceiver, and the the startup
process always read record and apply it. When it reads an invalid record,
it will shut down the walreceiver by signal SIGTERM. Then, it will read
from ARCHIVE/PG_WAL, just read files from pg_wal. Read an invalid record
again or read the file end(the read len is not equals XLOG_BLCKSZ), the
startup process will starts the walreceiver by RequestXLogStreaming() and
switch to read from XLOG_FROM_STREAM.

In RequestXLogStreaming(), set the walrcv->receiveStart = recptr, the
walreceiver will get the WAL from master start at walrcv->receiveStart. So
we can read the new data which streaming from master by the walreceiver
this time, instead of the wrong data on disk. It should not print the error
message never stop and the walreceiver should not be lost after we read an
invalid record. But the fact is not work.

What happened?
The previous step, the startup process starts the walreceiver, and switch
to read from XLOG_FROM_STREAM. Then, check the walreceiver is active before
we read it, even the postmaster does not start the walreceiver, but the
walrcv->walRcvState == STARTING, we think the walreceiver is active, and
ready to read.

Now, begin to read data if new data has arrived. How to check it?
If the Recptr, which is pointer we read, is lower than
walrcv->receivedUpto, we can read the data, even if the walreceiver does
not start completely and the data is OLD which has invalid reccord.
Read it, and read an invalid reccord again, just stop the walreceiver
again(the walreceiver does not start completely, it has not pid, just set
walrcv->walRcvState = WALRCV_STOPPED). When the walreceiver starts, running
into WalReceiverMain(), check the walrcv->walRcvState == WALRCV_STOPPED,
myself has been shut down by others, just exit. So the walreceiver starts,
exit again and again.
The startup process next to do is, starts the walreceiver, read data(read
the invalid record), shut down the walreceiver, also agiain and again.

Why restart standby server of PG will be OK?
The startup process begin to REDO, reads an invalid record, prints the
error message, and starts the walreceiver by RequestXLogStreaming() and
switch to read from XLOG_FROM_STREAM. This is first time to start the
walreceiver, set walrcv->receivedUpto = walrcv->receiveStart = recptr.
The startup process ready to read new data, but RecPtr >=
walrcv->receivedUpto, wait the walreceiver get WAL from master.
So that, we get the WAL from master instead of the WAL on disk, by restart
standby server of PG.

By my fix, ervery time we start the walreceiver, the startup process will
wait for new data instead of read OLD data, Just like restart standby
server.
So, we can fix the problem online and the walreceiver will not be lost.

Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> 于2019年12月11日周三 下午1:38写道:

> At Tue, 10 Dec 2019 10:40:53 -0800, Ashwin Agrawal <aagrawal(at)pivotal(dot)io>
> wrote in
> > On Tue, Dec 10, 2019 at 3:06 AM jiankang liu <liujk1994(at)gmail(dot)com>
> wrote:
> >
> > > Start Walreceiver completely before shut down it on standby server.
> > >
> > > The walreceiver will be shut down, when read an invalid record in the
> > > WAL streaming from master.And then, we retry from archive/pg_wal again.
> > >
> > > After that, we start walreceiver in RequestXLogStreaming(), and read
> > > record from the WAL streaming. But before walreceiver starts, we read
> > > data from file which be streamed over and present in pg_wal by last
> > > time, because of walrcv->receivedUpto > RecPtr and the wal is actually
> > > flush on disk. Now, we read the invalid record again, what the next to
> > > do? Shut down the walreceiver and do it again.
> > >
> >
> > I am missing something here, if walrcv->receivedUpto > RecPtr, why are we
> > getting / reading invalid record?
>
> I bet on that the standby is connecting to a wrong master. For
> example, something like happens when the master has been reinitalized
> from a backup and experienced another history, then the standby was
> initialized from the reborn master but the stale archive files on the
> standby are left alone.
>
> Anyway that cannot happen on correctly running replication set and
> what to do in the case is starting from a new basebackup of the
> master, making sure to erase stale archive files if any.
>
> About the proposed fix, it doesn't seem to cause start process to
> rewind WAL to that LSN. Even if that happens, it leads to no better
> than a broken database.
>
> regards.
>
> --
> Kyotaro Horiguchi
> NTT Open Source Software Center
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Koichi Suzuki 2019-12-11 08:16:11 Re: get_database_name() from background worker
Previous Message Peter Eisentraut 2019-12-11 07:45:26 Re: Unicode normalization test broken output