Re: Stronger safeguard for archive recovery not to miss data

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: osumi(dot)takamichi(at)fujitsu(dot)com
Cc: masao(dot)fujii(at)oss(dot)nttdata(dot)com, david(at)pgmasters(dot)net, pgsql-hackers(at)lists(dot)postgresql(dot)org, laurenz(dot)albe(at)cybertec(dot)at
Subject: Re: Stronger safeguard for archive recovery not to miss data
Date: 2021-04-06 06:24:21
Message-ID: 20210406.152421.159594903939861443.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Tue, 6 Apr 2021 04:11:35 +0000, "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com> wrote in
> On Tuesday, April 6, 2021 9:41 AM Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
> > On 2021/04/05 23:54, osumi(dot)takamichi(at)fujitsu(dot)com wrote:
> > >> This makes me think that we should document this risk.... Thought?
> > > +1. We should notify the risk when user changes
> > > the wal_level higher than minimal to minimal to invoke a carefulness
> > > of user for such kind of operation.
> >
> > I removed the HINT message "or recover to the point in ..." and added the
> > following note into the docs.
> >
> > Note that changing <varname>wal_level</varname> to
> > <literal>minimal</literal> makes any base backups taken before
> > unavailable for archive recovery and standby server, which may
> > lead to database loss.
> Thank you for updating the patch. Let's make the sentence more strict.
>
> My suggestion for this explanation is
> "In order to prevent database corruption, changing
> wal_level to minimal from higher level in the middle of
> WAL archiving requires careful attention. It makes any base backups
> taken before the operation unavailable for archive recovery
> and standby server. Also, it may lead to whole database loss when
> archive recovery fails with an error for that change.
> Take a new base backup immediately after making wal_level back to higher level."

The first sentense looks like somewhat nanny-ish. The database is not
corrupt at the time of this error. We just lose updates after the last
read segment at this point. As Fujii-san said, we can continue
recoverying using crash recovery and we will reach having a corrupt
database after that.

About the last sentense, I prefer more flat wording, such as "You need
to take a new base backup..."

> Then, we can be consistent with our new hint message,
> "Use a backup taken after setting wal_level to higher than minimal.".
>
> Is it better to add something similar to "Take an offline backup when you stop the server
> and change the wal_level" around the end of this part as another option for safeguard, also?

Backup policy is completely a matter of DBAs. If flipping wal_level
alone highly causes unstartable corruption,,, I think it is a bug.

> For the performance technique part, what we need to explain is same.

Might be good, but in simpler wording.

> Another minor thing I felt we need to do might be to add double quotes to wrap minimal in errhint.

Since the error about hot_standby has gone, either will do for me.

> Other errhints do so when we use it in a sentence.
>
> There is no more additional comment from me !

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2021-04-06 06:49:20 Re: Replication slot stats misgivings
Previous Message Noah Misch 2021-04-06 06:20:56 Re: policies with security definer option for allowing inline optimization