Re: Walsender may fail to send wal to the end.

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: sfrost(at)snowman(dot)net
Cc: michael(at)paquier(dot)xyz, andres(at)anarazel(dot)de, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Walsender may fail to send wal to the end.
Date: 2021-03-30 06:42:05
Message-ID: 20210330.154205.1619318594309963027.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Mon, 29 Mar 2021 11:41:32 -0400, Stephen Frost <sfrost(at)snowman(dot)net> wrote in
> Greetings,
>
> * Kyotaro Horiguchi (horikyota(dot)ntt(at)gmail(dot)com) wrote:
> > At Mon, 29 Mar 2021 14:47:33 +0900, Michael Paquier <michael(at)paquier(dot)xyz> wrote in
> > > On Fri, Mar 26, 2021 at 10:16:40AM -0700, Andres Freund wrote:
> > > > On 2021-03-26 18:20:14 +0900, Kyotaro Horiguchi wrote:
> > > > > This is because XLogSendPhysical detects removal of the wal segment
> > > > > currently reading by shutdown checkpoint. However, there' no fear of
> > > > > overwriting of WAL segments at the time.
> > > > >
> > > > > So I think we can omit the call to CheckXLogRemoved() while
> > > > > MyWalSnd->state is WALSNDSTTE_STOPPING because the state comes after
> > > > > the shutdown checkpoint completes.
> > > > >
> > > > > Of course that doesn't help if walsender was running two segments
> > > > > behind. There still could be a small window for the failure. But it's
> > > > > a great help to save the case of just 1 segment behind.
> > > >
> > > > -1. This seems like a bandaid to make a broken configuration work a tiny
> > > > bit better, without actually being meaningfully better.
> > >
> > > Agreed. Still, wouldn't it be better to avoid such configurations and
> > > protect a bit things with a check on the new value?
>
> I have a hard time agreeing that this is somehow a 'broken'
> configuration, instead it looks like a race condition that wasn't
> considered and should be addressed. If there's zero lag then we really
> should allow the final WAL to get sent to the replica.

My unstated point was switching primary/secondary roles in a
replication set where both host have separate archives, by the steps
"fast shutdown primary"->"promote standby"->"attach the old primary as
new standby", wihtout a need of synchronizing old primary's archive to
that of the new standby before starting the new standby. I thought
that should work even if wal_keep_size = 0.

> > The repro was a bit artificial but the symptom happened without
> > pg_switch_wal() and no load. It caused just by shutting down of
> > primary. If it is normal behavior for walsenders to fail to send the
> > last shutdown record to standby while fast shutdown, we should refuse
> > to startup at least wal sender if wal_keep_size = 0.
> >
> > I can guess two ways to do that.
>
> Both of which will break things for people, so this certainly isn't a
> great approach, and besides, if archiving is happening with
> archive_command and the replica has a restore command then it should be

Right.

> able to follow that just fine, no? So we'd have to also check if
> archive_command has been set up and hope the admin has a restore

Yeah, that sounds stupid (or kind of impossible).

> command. Having to go through that dance instead of just making sure to
> push out the last WAL to the replica seems a bit silly though.

Sounds reasonable to me.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2021-03-30 06:50:28 Re: Refactor SSL test framework to support multiple TLS libraries
Previous Message James Hilliard 2021-03-30 06:39:42 Re: [PATCH v3 1/1] Fix detection of preadv/pwritev support for OSX.