Re: More issues with pg_verify_checksums and checksum verification in base backups

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Stephen Frost <sfrost(at)snowman(dot)net>, Andres Freund <andres(at)anarazel(dot)de>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Magnus Hagander <magnus(at)hagander(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: More issues with pg_verify_checksums and checksum verification in base backups
Date: 2018-11-01 11:14:40
Message-ID: CAA4eK1+nD772t9Ok_Gy-WA+4CEe2o_c2XJz9J5y70ZooYWGSpw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Oct 21, 2018 at 7:12 PM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> Hi all,
>
> This is a follow-up of the following thread:
> https://www.postgresql.org/message-id/20181012010411.re53cwcistcpip5j@alap3.anarazel.de
>
> In a nutshell, b34e84f1 has added TAP tests for pg_verify_checksums, and
> the buildfarm has immediately complained about files generated by
> EXEC_BACKEND missing, causing pg_verify_checksums to fail for such
> builds. An extra issue has been noticed by Andres in the tool: it
> misses to check for temporary files, hence files like
> base/pgsql_tmp/pgsql_tmp$PID.NN.sharedfileset/1.1 can also cause the
> tool to fail. After a crash, those files would not be removed, and
> stopping the instance would still let them around.
>
> pg_verify_checksums used first a blacklist to decide if files have
> checksums or not. So with this approach all files should have checksums
> except files like pg_control, pg_filenode.map, PG_VERSION, etc.
>
> d55241a has added a first fix to silence the buildfarm for EXEC_BACKEND
> builds. After discussion, Andres has pointed out that some extensions
> may want to drop files within global/ or base/ as part of their logic.
> cstore_fdw was mentioned but if you look at their README that's not the
> case. However, I think that Andres argument is pretty good as with
> pluggable storage we should allow extensions to put custom files for
> different tablespaces. So this commit has changed the logic of
> pg_verify_checksums to use a whitelist, which assumes that only normal
> relation files have checksums:
> <digits>
> <digits>.<segment>
> <digits>_<forkname>
> <digits>_<forkname>.<segment
>
> After more discussion on the thread mentioned above, Stephen has pointed
> out that base backups use the same blacklist logic when verifying
> checksums. This has the same problem with EXEC_BACKEND-specific files,
> resulting in spurious warnings (that's an example!) which are confusing
> for the user:
> $ pg_basebackup -D popo
> WARNING: cannot verify checksum in file "./global/config_exec_params",
> block 0: read buffer size 5 and page size 8192 differ
>
> Folks on this thread agreed that both pg_verify_checksums and checksum
> verification for base backups should use the same logic. It seems to me
> that using a whitelist-based approach for both is easier to maintain as
> the patterns of files supporting checksums is more limited than files
> which do not support checksums. So this way we allow extensions
> bundling custom files to still work with pg_verify_checksums and
> checksum verification in base backups.
>

This sounds like a good argument for having a whitelist approach, but
is it really a big problem if a user gets warning for files that the
utility is not able to verify checksums for? I think in some sense
this message can be useful to the user as it can allow him to know
which files are not verified by the utility for some form of
corruption. I guess one can say that users might not be interested in
this information in which case such a check could be optional as you
seem to be suggesting in the following paragraph.

> Something else which has been discussed on this thread is that we should
> have a dedicated API to decide if a file has checksums or not, and if it
> should be skipped in both cases. That's work only for HEAD though, so
> we need to do something for HEAD and v11, which is simple.
>

+1.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Surafel Temesgen 2018-11-01 11:28:11 Re: FETCH FIRST clause WITH TIES option
Previous Message Alexander Kukushkin 2018-11-01 10:58:52 Re: Connection slots reserved for replication