RE: Stronger safeguard for archive recovery not to miss data

From: "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>
To: 'Laurenz Albe' <laurenz(dot)albe(at)cybertec(dot)at>, 'Kyotaro Horiguchi' <horikyota(dot)ntt(at)gmail(dot)com>
Cc: "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: Stronger safeguard for archive recovery not to miss data
Date: 2021-01-18 07:34:44
Message-ID: OSBPR01MB4888E218E70404F7473AA803EDA40@OSBPR01MB4888.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi, Laurenz

On Friday, January 15, 2021 12:56 AM Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at> wrote:
> On Tue, 2020-12-08 at 03:08 +0000, osumi(dot)takamichi(at)fujitsu(dot)com wrote:
> > On Thursday, November 26, 2020 4:29 PM Kyotaro Horiguchi
> > <horikyota(dot)ntt(at)gmail(dot)com> wrote:
> > > At Thu, 26 Nov 2020 07:18:39 +0000, "osumi(dot)takamichi(at)fujitsu(dot)com"
> > > <osumi(dot)takamichi(at)fujitsu(dot)com> wrote in
> > > > The attached patch is intended to prevent a scenario that archive
> > > > recovery hits WALs which come from wal_level=minimal and the
> > > > server continues to work, which was discussed in the thread of [1].
> > >
> > > Perhaps we need the TAP test that conducts the above steps.
> >
> > I added the TAP tests to reproduce and share the result, using the
> > case of 6-(1) described above.
> > Here, I created a new file for it because the purposes of other files
> > in src/recovery didn't match the purpose of my TAP tests perfectly.
> > If you are dubious about this idea, please have a look at the comments
> > in each file.
> >
> > When the attached patch is applied,
> > my TAP tests are executed like other ones like below.
> >
> > t/018_wal_optimize.pl ................ ok t/019_replslot_limit.pl
> > .............. ok t/020_archive_status.pl .............. ok
> > t/021_row_visibility.pl .............. ok t/022_archive_recovery.pl
> > ............ ok All tests successful.
> >
> > Also, I confirmed that there's no regression by make check-world.
> > Any comments ?
>
> The patch applies and passes regression tests, as well as the new TAP test.
Thank you for checking.

> I think this should be backpatched, since it fixes a bug.
Agreed.

> I am not quite happy with the message:
>
> FATAL: WAL was generated with wal_level=minimal, data may be missing
> HINT: This happens if you temporarily set wal_level=minimal without taking a
> new base backup.
>
> This sounds too harmless to me and doesn't give the user a clue what would be
> the best way to proceed.
>
> Suggestion:
>
> FATAL: WAL was generated with wal_level=minimal, cannot continue
> recovering
Adopted.

> DETAIL: This happens if you temporarily set wal_level=minimal on the primary
> server.
> HINT: Create a new standby from a new base backup after setting
> wal_level=replica.
Thanks for your suggestion.
I noticed that this message should cover both archive recovery modes,
which means in recovery mode and standby mode. Then, I combined your
suggestion above with this point of view. Have a look at the updated patch.
I also enriched the new tap tests to show this perspective.

Best Regards,
Takamichi Osumi

Attachment Content-Type Size
stronger_safeguard_for_archive_recovery_v03.patch application/octet-stream 5.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuro Yamada 2021-01-18 07:43:12 Re: list of extended statistics on psql
Previous Message Tang, Haiying 2021-01-18 07:32:24 RE: Parallel INSERT (INTO ... SELECT ...)