Re: [patch] Fix pg_checksums to allow checking of offline base backup directories

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Michael Banck <michael(dot)banck(at)credativ(dot)de>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: [patch] Fix pg_checksums to allow checking of offline base backup directories
Date: 2020-04-07 08:07:44
Message-ID: 20200407080744.GB6655@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Apr 06, 2020 at 01:26:17PM +0200, Michael Banck wrote:
> I think we can allow checking of base backups if we make sure
> backup_label exists in the data directory or am I missing something?
> I think we need to have similar checks about pages changed during base
> backup, so this patch ignores checksum failures between the checkpoint
> LSN and (as a reasonable upper bound) the last LSN of the last existing
> transaction log file. If no xlog files exist (the --wal-method=none
> case), the last LSN of the checkpoint WAL segment is taken.

Have you considered that backup_label files can exist in the data
directory of a live cluster? That's not the case with pg_basebackup
or non-exclusive backups with the SQL interface, but that's possible
with the SQL interface and an exclusive backup running.

FWIW, my take on this matter is that you should consider checksum
verification as one step to check the sanity of a base backup, meaning
that you have to restore the base backup first, then let it reach its
consistent LSN position, and finally stop the cluster cleanly to make
sure that everything is safely flushed on disk and consistent.
Attempting to verify checksums from a raw base backup would most
likely lead to false positives, and my guess is that your patch has
issues in this area. Hint at quick glance: the code path setting
insertLimitLSN where you actually don't use any APIs from
xlogreader.h.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2020-04-07 08:17:32 Re: Don't try fetching future segment of a TLI.
Previous Message Masahiko Sawada 2020-04-07 07:59:39 Re: pg_stat_statements issue with parallel maintenance (Was Re: WAL usage calculation patch)