pgsql: Change how a base backup decides which files have checksums.

From: Robert Haas <rhaas(at)postgresql(dot)org>
To: pgsql-committers(at)lists(dot)postgresql(dot)org
Subject: pgsql: Change how a base backup decides which files have checksums.
Date: 2023-11-14 16:06:15
Message-ID: E1r2vvb-005R2D-9H@gemulon.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers

Change how a base backup decides which files have checksums.

Previously, it thought that any plain file located under global, base,
or a tablespace directory had checksums unless it was in a short list
of excluded files. Now, it thinks that files in those directories have
checksums if parse_filename_for_nontemp_relation says that they are
relation files. (Temporary relation files don't matter because they're
excluded from the backup anyway.)

This changes the behavior if you have stray files not managed by
PostgreSQL in the relevant directories. Previously, you'd get some
kind of checksum-related complaint if such files existed, assuming
that the cluster had checksums enabled and that the base backup
wasn't run with NOVERIFY_CHECKSUMS. Now, you won't get those
complaints any more. That seems like an improvement to me, because
those files were presumably not created by PostgreSQL and so there
is no reason to think that they would be checksummed like a
PostgreSQL relation file. (If we want to complain about such files,
we should complain about them existing at all, not just about their
checksums.)

The point of this change is to make the code more consistent.
sendDir() was already calling parse_filename_for_nontemp_relation()
as part of an effort to determine which files to include in the
backup. So, it already had the information about whether a certain
file was a relation file. sendFile() then used a separate method,
embodied in is_checksummed_file(), to make what is essentially
the same determination. It's better not to make the same decision
using two different methods, especially in closely-related code.

Patch by me. Reviewed by Dilip Kumar and Álvaro Herrera. Thanks
also to Jakub Wartak and Peter Eisentraut for comments, suggestions,
and testing on the larger patch set of which this is a part.

Discussion: http://postgr.es/m/CAFiTN-snhaKkWhi2Gz5i3cZeKefun6sYL==wBoqqnTXxX4_mFA@mail.gmail.com
Discussion: http://postgr.es/m/202311141312.u4qx5gtpvfq3@alvherre.pgsql

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/025584a168a4b3002e19350bb8db0ebf1fd10235

Modified Files
--------------
src/backend/backup/basebackup.c | 172 +++++++++++++---------------------------
1 file changed, 55 insertions(+), 117 deletions(-)

Browse pgsql-committers by date

  From Date Subject
Next Message Alvaro Herrera 2023-11-14 16:49:59 Re: pgsql: doc: fix wording describing the checkpoint_flush_after GUC
Previous Message Dean Rasheed 2023-11-14 11:01:52 pgsql: Support +/- infinity in the interval data type.