Re: Strange decreasing value of pg_last_wal_receive_lsn()

From: Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com>
To: godjan • <g0dj4n(at)gmail(dot)com>
Cc: Sergei Kornilov <sk(at)zsrv(dot)org>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Michael Paquier <michael(at)paquier(dot)xyz>
Subject: Re: Strange decreasing value of pg_last_wal_receive_lsn()
Date: 2020-05-14 16:44:57
Message-ID: 20200514184457.48d58ef5@firost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

(please, the list policy is bottom-posting to keep history clean, thanks).

On Thu, 14 May 2020 07:18:33 +0500
godjan • <g0dj4n(at)gmail(dot)com> wrote:

> -> Why do you kill -9 your standby?
> Hi, it’s Jepsen test for our HA solution. It checks that we don’t lose data
> in such situation.

OK. This test is highly useful to stress data high availability and durability,
of course. However, how useful is this test in a context of auto failover for
**service** high availability? If all your nodes are killed in the same
disaster, how/why an automatic cluster manager should take care of starting all
nodes again and pick the right node to promote?

> So, now we update logic as Michael said. All ha alive standbys now waiting
> for replaying all WAL that they have and after we use pg_last_replay_lsn() to
> choose which standby will be promoted in failover.
>
> It fixed out trouble, but there is one another. Now we should wait when all
> ha alive hosts finish replaying WAL to failover. It might take a while(for
> example WAL contains wal_record about splitting b-tree).

Indeed, this is the concern I wrote about yesterday in a second mail on this
thread.

> We are looking for options that will allow us to find a standby that contains
> all data and replay all WAL only for this standby before failover.

Note that when you promote a node, it first replays available WALs before
acting as a primary. So you can safely signal the promotion to the node and
wait for it to finish the replay and promote.

> Maybe you have ideas on how to keep the last actual value of
> pg_last_wal_receive_lsn()?

Nope, no clean and elegant idea. One your instances are killed, maybe you can
force flush the system cache (secure in-memory-only data) and read the latest
received WAL using pg_waldump?

But, what if some more data are available from archives, but not received from
streaming rep because of a high lag?

> As I understand WAL receiver doesn’t write to disk walrcv->flushedUpto.

I'm not sure to understand what you mean here.
pg_last_wal_receive_lsn() reports the actual value of walrcv->flushedUpto.
walrcv->flushedUpto reports the latest LSN force-flushed to disk.

> > On 13 May 2020, at 19:52, Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com>
> > wrote:
> >
> >
> > (too bad the history has been removed to keep context)
> >
> > On Fri, 8 May 2020 15:02:26 +0500
> > godjan • <g0dj4n(at)gmail(dot)com> wrote:
> >
> >> I got it, thank you.
> >> Can you recommend what to use to determine which quorum standby should be
> >> promoted in such case? We planned to use pg_last_wal_receive_lsn() to
> >> determine which has fresh data but if it returns the beginning of the
> >> segment on both replicas we can’t determine which standby confirmed that
> >> write transaction to disk.
> >
> > Wait, pg_last_wal_receive_lsn() only decrease because you killed your
> > standby.
> >
> > pg_last_wal_receive_lsn() returns the value of walrcv->flushedUpto. The
> > later is set to the beginning of the segment requested only during the first
> > walreceiver startup or a timeline fork:
> >
> > /*
> > * If this is the first startup of walreceiver (on this timeline),
> > * initialize flushedUpto and latestChunkStart to the starting
> > point. */
> > if (walrcv->receiveStart == 0 || walrcv->receivedTLI != tli)
> > {
> > walrcv->flushedUpto = recptr;
> > walrcv->receivedTLI = tli;
> > walrcv->latestChunkStart = recptr;
> > }
> > walrcv->receiveStart = recptr;
> > walrcv->receiveStartTLI = tli;
> >
> > After a primary loss, as far as the standby are up and running, it is fine
> > to use pg_last_wal_receive_lsn().
> >
> > Why do you kill -9 your standby? Whay am I missing? Could you explain the
> > usecase you are working on to justify this?
> >
> > Regards,
>
>
>

--
Jehan-Guillaume de Rorthais
Dalibo

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ranier Vilela 2020-05-14 17:40:50 [PATCH] Fix ouside scope t_ctid (ItemPointerData)
Previous Message Bruce Momjian 2020-05-14 15:50:08 Re: PG 13 release notes, first draft