RE: Stronger safeguard for archive recovery not to miss data

From: "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>
To: 'Fujii Masao' <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: "david(at)pgmasters(dot)net" <david(at)pgmasters(dot)net>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "laurenz(dot)albe(at)cybertec(dot)at" <laurenz(dot)albe(at)cybertec(dot)at>
Subject: RE: Stronger safeguard for archive recovery not to miss data
Date: 2021-04-05 14:54:13
Message-ID: OSBPR01MB48882EF2845B2C152BBBA9F7ED779@OSBPR01MB4888.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Monday, April 5, 2021 9:16 PM Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote:
> On 2021/04/05 16:13, Kyotaro Horiguchi wrote:
> > At Mon, 5 Apr 2021 12:34:53 +0900, Fujii Masao
> > <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote in
> >>
> >>
> >> On 2021/04/04 11:58, osumi(dot)takamichi(at)fujitsu(dot)com wrote:
> >>>> IMO it's better to comment why this server restart is necessary.
> >>>> As far as I understand correctly, this is necessary to ensure the
> >>>> WAL file containing the record about the change of wal_level (to
> >>>> minimal) is archived, so that the subsequent archive recovery will
> >>>> be able to replay it.
> >>> OK, added some comments. Further, I felt the way I wrote this part
> >>> was not good at all and self-evident and developers who read this
> >>> test would feel uneasy about that point.
> >>> So, a little bit fixed that test so that we can get clearer
> >>> conviction for wal archive.
> >>
> >> LGTM. Thanks for updating the patch!
> >>
> >> Attached is the updated version of the patch. I applied the following
> >> changes.
> >
> > + errhint("Use a backup taken after setting
> wal_level to higher than minimal "
> > + "or recover to the point in
> time before wal_level was changed
> > +to minimal even though it may cause data loss.")));
> >
> > Looking the HINT message, I thought that it's hard to find where up to
> > I should recover.
>
> Yes. And, what's the worse, when archive recovery finds WAL generated with
> wal_level=minimal and fails, "minimal" is saved in pg_control's wal_level.
> This means that subsequent archive recovery always fails at the beginning of
> recovery (before entering WAL replay main loop), in that case.
> So even if recovery_targrt_lsn is specified, archive recovery fails before
> checking that. Any recovery target settings have no effect on that case.
>
> Maybe we can avoid this, for example, by changing xlog_redo() so that it calls
> CheckRequiredParameterValues() before UpdateControlFile().
> But I'm not sure if this change is safe. Probably we need more time to
> consider this, but right now there is no so much time left at this stage.
>
> At least the HINT message "or recover to the point in time before wal_level
> was changed to minimal even though it may cause data loss." should be
> removed because it's not helpful at all...
>
> Ok, so if archive recovery finds WAL generated with wal_level=minimal and
> fails, and also there is no backup taken after wal_level is set to higher than
> minimal, basically [1] we lose whole database. I think that those who set
> wal_level to minimal understand that this setting can cause data loss, for
> example, any data loaded with wal_level=minimal may be lost later. But I'm
> afraid that they might not understand the risk of whole database loss.
>
> Even if they take new backup just after they set wal_level to higher than
> minimal, there is still the risk of whole database loss until the backup is
> completed.
>
> This makes me think that we should document this risk.... Thought?
+1. We should notify the risk when user changes
the wal_level higher than minimal to minimal
to invoke a carefulness of user for such kind of operation.

Best Regards,
Takamichi Osumi

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2021-04-05 15:08:46 Re: Get memory contexts of an arbitrary backend process
Previous Message osumi.takamichi@fujitsu.com 2021-04-05 14:49:04 RE: Stronger safeguard for archive recovery not to miss data