Re: Walsender may fail to send wal to the end.

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: michael(at)paquier(dot)xyz
Cc: andres(at)anarazel(dot)de, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Walsender may fail to send wal to the end.
Date: 2021-03-29 09:07:16
Message-ID: 20210329.180716.1610312260169575415.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Mon, 29 Mar 2021 14:47:33 +0900, Michael Paquier <michael(at)paquier(dot)xyz> wrote in
> On Fri, Mar 26, 2021 at 10:16:40AM -0700, Andres Freund wrote:
> > Hi,
> >
> > On 2021-03-26 18:20:14 +0900, Kyotaro Horiguchi wrote:
> > > This is because XLogSendPhysical detects removal of the wal segment
> > > currently reading by shutdown checkpoint. However, there' no fear of
> > > overwriting of WAL segments at the time.
> > >
> > > So I think we can omit the call to CheckXLogRemoved() while
> > > MyWalSnd->state is WALSNDSTTE_STOPPING because the state comes after
> > > the shutdown checkpoint completes.
> > >
> > > Of course that doesn't help if walsender was running two segments
> > > behind. There still could be a small window for the failure. But it's
> > > a great help to save the case of just 1 segment behind.
> >
> > -1. This seems like a bandaid to make a broken configuration work a tiny
> > bit better, without actually being meaningfully better.
>
> Agreed. Still, wouldn't it be better to avoid such configurations and
> protect a bit things with a check on the new value?

The repro was a bit artificial but the symptom happened without
pg_switch_wal() and no load. It caused just by shutting down of
primary. If it is normal behavior for walsenders to fail to send the
last shutdown record to standby while fast shutdown, we should refuse
to startup at least wal sender if wal_keep_size = 0.

I can guess two ways to do that.

1. refuse to start server if wal_keep_size = 0 when max_wal_senders > 0.

2. refuse to start wal sender if wal_keep_size= 0.

2 looks like broken. 1 is somewhat annoying.. However, since
max_wal_senders already premises wal_level > minimal, we can accept
that restriction?

<start serer>

FATAL: WAL streaming (max_wal_senders > 0) requires wal_level "replica" or "logical"

<Mmm. wal_level, fixed, then retry starting server>

FATAL: WAL streaming (max_wal_senders > 0) requires wal_keep_size to be at least 1MB

<Oops!>

Of couse we can list all incompatible parameters at once.

FATAL: WAL streaming (max_wal_senders > 0) requires wal_level "replica" or "logical" and wal_keep_size to be at least 1MB

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2021-03-29 09:13:23 Re: [PATCH] Provide more information to filter_prepare
Previous Message Pavel Stehule 2021-03-29 08:37:17 Re: [Proposal] Global temporary tables