RE: Stronger safeguard for archive recovery not to miss data

From: "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>
To: 'Kyotaro Horiguchi' <horikyota(dot)ntt(at)gmail(dot)com>, "masao(dot)fujii(at)oss(dot)nttdata(dot)com" <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
Cc: "david(at)pgmasters(dot)net" <david(at)pgmasters(dot)net>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "laurenz(dot)albe(at)cybertec(dot)at" <laurenz(dot)albe(at)cybertec(dot)at>
Subject: RE: Stronger safeguard for archive recovery not to miss data
Date: 2021-04-01 03:45:57
Message-ID: OSBPR01MB488886DB12B35261CDB44DBAED7B9@OSBPR01MB4888.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Wednesday, March 31, 2021 3:06 PM Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote:
> At Wed, 31 Mar 2021 15:03:28 +0900 (JST), Kyotaro Horiguchi
> <horikyota(dot)ntt(at)gmail(dot)com> wrote in
> > At Wed, 31 Mar 2021 02:11:48 +0900, Fujii Masao
> > <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote in
> > > > So, I would revert all the changes in xlog.c except changing the
> > > > warning to an error:
> > > > - ereport(WARNING,
> > > > - (errmsg("WAL was generated with
> > > > wal_level=minimal, -data may be missing"),
> > > > - errhint("This happens if you temporarily set
> > > > -wal_level=minimal without taking a new base backup.")));
> > > > + ereport(FATAL,
> > > > + (errmsg("WAL was generated with
> > > > wal_level=minimal, cannot continue recovering"),
> > > > + errdetail("This happens if you temporarily
> > > > +set
> > > > wal_level=minimal on the server."),
> > > > + errhint("Run recovery again from a new
> base
> > > > backup taken after setting wal_level higher than minimal")));
> > > I guess that users usually encounter this error because they have
> > > not taken base backups yet after setting wal_level to higher than
> > > minimal and have to use the old base backup for archive recovery. So
> > > I'm not sure how much only this HINT is helpful for them. Isn't it
> > > better to append something like "If there is no such backup, recover
> > > to the point in time before wal_level is set to minimal even though
> > > which cause data loss, to start the server." into HINT?
> >
> > I agree that the hint doesn't make sense.
>
> For the primary case,
>
> > HINT: Restart with archive recovery turned off. The past backups are no
> longer usable. You need to take a new one after restart.
> >
> > If it's the replica case, it would be..
> >
> > HINT: Start from a fresh standby created from the curent primary server.
>
> Start from a fresh backup...
Thank you for sharing your ideas about the hint. Absolutely need to change the message.
In my opinion, combining the basic idea of yours and Fujii-san's would be the best.

Updated the patch and made v05. The changes I made are

* rewording of errhint although this has become long !
* fix of the typo in the TAP test
* modification of my past changes not to change conditions in CheckRequiredParameterValues
* rename of the test file to 024_archive_recovery.pl because two files are made
since the last update of this patch
* pgindent is conducted to check my alignment again.

By the way, when I build postgres with this patch and enable-coverage option,
the results of RT becomes unstable. Does someone know the reason ?
When it fails, I get stderr like below

t/001_start_stop.pl .. 10/24
# Failed test 'pg_ctl start: no stderr'
# at t/001_start_stop.pl line 48.
# got: 'profiling:/home/k5user/new_disk/recheck/PostgreSQL-Source-Dev/src/backend/executor/execMain.gcda:Merge mismatch for function 15
# '
# expected: ''
t/001_start_stop.pl .. 24/24 # Looks like you failed 1 test of 24.
t/001_start_stop.pl .. Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/24 subtests

Similar phenomena was observed in [1] and its solution
seems to upgrade my gcc higher than 7. And, I did so but still get this unstable error with
enable-coverage. This didn't happen when I remove enable-option and
the make check-world passes.

[1] - https://www.mail-archive.com/pgsql-hackers(at)postgresql(dot)org/msg323147.html

Best Regards,
Takamichi Osumi

Attachment Content-Type Size
stronger_safeguard_for_archive_recovery_v05.patch application/octet-stream 6.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2021-04-01 03:52:21 Re: Flaky vacuum truncate test in reloptions.sql
Previous Message Thomas Munro 2021-04-01 03:40:16 Re: MultiXact\SLRU buffers configuration